US 12,462,423 B2
	Scene embedding for visual navigation
Abel Karl Brown, Dumfries, VA (US); and Robert Stephen DiPietro, Baltimore, MD (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by Nvidia Corporation, Santa Clara, CA (US)
Filed on Jan. 19, 2021, as Appl. No. 17/152,308.
Application 17/152,308 is a continuation of application No. 16/216,458, filed on Dec. 11, 2018, granted, now 10,902,616.
Claims priority of provisional application 62/718,302, filed on Aug. 13, 2018.
Prior Publication US 2021/0142491 A1, May 13, 2021
Int. Cl. G06T 7/73 (2017.01); G05D 1/00 (2024.01); G06F 18/214 (2023.01); G06T 7/246 (2017.01); G06V 10/77 (2022.01); G06V 10/82 (2022.01); G06V 20/10 (2022.01); G06V 20/40 (2022.01)

CPC G06T 7/74 (2017.01) [G05D 1/0088 (2013.01); G05D 1/0221 (2013.01); G05D 1/0231 (2013.01); G06F 18/214 (2023.01); G06T 7/248 (2017.01); G06V 10/7715 (2022.01); G06V 10/82 (2022.01); G06V 20/10 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30241 (2013.01); G06T 2207/30248 (2013.01)]

19 Claims

1. One or more processors, comprising:

circuitry to use one or more neural networks to control direction of a vehicle based, at least in part, on one or more sets of three or more images comprising two or more non-consecutive video frames, and one or more dissimilarities between one or more features of the two or more non-consecutive video frames.

8. A vehicle, comprising:

one or more processors to use one or more neural networks to control direction of the vehicle based, at least in part, on one or more sets of three or more images comprising two or more non-consecutive video frames, and one or more dissimilarities between one or more features of the two or more non-consecutive video frames.

15. A method, comprising:

using one or more neural networks to control direction of a vehicle based, at least in part, on one or more dissimilarities between one or more features of two or more non-consecutive video frames, wherein

the one or more neural networks are to generate one or more topological representations of an environment corresponding to one or more vector space models determined based at least in part upon one or more temporal sequences of one or more images of the environment.