US 11,721,067 B2
	System and method for virtual modeling of indoor scenes from imagery
Brian Totty, Mountain View, CA (US); Kevin Wong, Mountain View, CA (US); Jianfeng Yin, Mountain View, CA (US); Luis Puig Morales, Mountain View, CA (US); Paul Gauthier, Mountain View, CA (US); Salma Jiddi, Mountain View, CA (US); Qiqin Dai, Mountain View, CA (US); Brian Pugh, Mountain View, CA (US); Konstantinos Nektarios Lianos, Mountain View, CA (US); Angus Dorbie, Mountain View, CA (US); Yacine Alami, Mountain View, CA (US); Marc Eder, Mountain View, CA (US); Christopher Sweeney, Mountain View, CA (US); and Javier Civera, Mountain View, CA (US)
Assigned to Geomagical Labs, Inc., Mountain View, CA (US)
Filed by Geomagical Labs, Inc., Mountain View, CA (US)
Filed on Sep. 29, 2021, as Appl. No. 17/488,305.
Application 17/488,305 is a continuation of application No. 16/823,123, filed on Mar. 18, 2020, granted, now 11,170,569.
Claims priority of provisional application 62/819,817, filed on Mar. 18, 2019.
Prior Publication US 2022/0020210 A1, Jan. 20, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 17/05 (2011.01); G06T 7/174 (2017.01); G06T 7/50 (2017.01); G06T 5/50 (2006.01)

CPC G06T 17/05 (2013.01) [G06T 5/50 (2013.01); G06T 7/174 (2017.01); G06T 7/50 (2017.01); G06T 2207/10028 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/20221 (2013.01)]

21 Claims

1. A method executed by one or more computing devices for generating a virtual representation of a physical scene, the method comprising:

receiving scene data corresponding to the physical scene;

processing the scene data to determine a plurality of scene components and a plurality of scene priors corresponding to the plurality of scene components, wherein the plurality of scene priors comprise a plurality of semantic segmentation masks and wherein processing the scene data comprises applying semantic segmentation to the scene data to generate a plurality of semantic segmentation masks, each semantic segmentation mask mapping a group of pixels in the scene data to a semantic label;

generating a plurality of dense geometric representations by inputting the plurality of scene priors into one or more neural networks trained to generate dense geometric representations, wherein each dense geometric representation corresponds to a scene component in the plurality of scene components;

generating a virtual model of the physical scene based at least in part on the plurality of dense geometric representations; and

generating a virtual representation of the physical scene based at least in part on the scene data, the virtual representation being aligned with the virtual model.