US 11,657,419 B2
Systems and methods for building a virtual representation of a location
Marc Eder, San Diego, CA (US); Siddharth Mohan, San Diego, CA (US); Maciej Halber, San Diego, CA (US); Anoop Jakka, San Diego, CA (US); Devin Waltman, San Diego, CA (US); and Zachary Rattner, San Diego, CA (US)
Assigned to YEMBO, INC., San Diego, CA (US)
Filed by Yembo, Inc., San Diego, CA (US)
Filed on Mar. 5, 2021, as Appl. No. 17/194,075.
Claims priority of provisional application 62/986,061, filed on Mar. 6, 2020.
Prior Publication US 2021/0279957 A1, Sep. 9, 2021
Int. Cl. G06T 7/50 (2017.01); G06T 7/20 (2017.01); G06Q 30/0204 (2023.01); G06Q 40/08 (2012.01); G06F 16/29 (2019.01); G06N 20/00 (2019.01); G06Q 10/0875 (2023.01); G06Q 10/0639 (2023.01); G06T 7/00 (2017.01); G06Q 30/0201 (2023.01); G06N 3/08 (2023.01); G06T 17/00 (2006.01); G06T 7/80 (2017.01); G06F 3/04842 (2022.01); G06T 17/20 (2006.01); G06V 20/00 (2022.01); G06N 3/044 (2023.01); H04N 23/63 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/64 (2022.01); G01S 17/06 (2006.01); G06Q 50/16 (2012.01); G01S 19/42 (2010.01); G06F 18/24 (2023.01)
CPC G06Q 30/0205 (2013.01) [G06F 3/04842 (2013.01); G06F 16/29 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 10/06395 (2013.01); G06Q 10/0875 (2013.01); G06Q 30/0206 (2013.01); G06Q 40/08 (2013.01); G06T 7/0002 (2013.01); G06T 7/20 (2013.01); G06T 7/50 (2017.01); G06T 7/80 (2017.01); G06T 17/00 (2013.01); G06T 17/20 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/00 (2022.01); G06V 20/64 (2022.01); H04N 23/631 (2023.01); G01S 17/06 (2013.01); G01S 19/42 (2013.01); G06F 18/24 (2023.01); G06Q 50/16 (2013.01); G06T 2207/20081 (2013.01)] 32 Claims
OG exemplary drawing
 
1. A system configured to generate a virtual representation of a location with spatially localized information of elements within the location being embedded in the virtual representation, the system comprising one or more hardware processors configured by machine-readable instructions to:
receive description data of a location, the description data being generated via at least one of a camera, a user interface, an environment sensor, and an external location information database, the description data comprising a plurality of images, and pose matrices;
receive metadata associated with elements within the location;
generate, in real-time, via a machine learning model and/or a geometric model, a 3-dimensional (3D) model of the location and the elements therein, the machine learning model being configured to receive the plurality of image and pose matrices as inputs and predict geometry of the location and the elements therein to form the 3D model; and
generate, based on the 3D model of the location, a virtual representation of the location by annotating the 3D model with spatially localized metadata associated with the elements within the location, and semantic information of the elements within the location, the virtual representation being editable by a user to allow modifications to the spatially localized metadata;
wherein generating the 3D model comprises:
encoding each image of the plurality of images with the machine learning model;
adjusting, based on the encoded images of the plurality of images, an intrinsics matrix associated with the camera;
using the intrinsics matrix and pose matrices to back-project the encoded images into a predefined voxel grid volume;
providing the voxel grid with features as input to a neural network to predict the 3D model of the location for each voxel in the voxel grid; and
extracting a 2D surface of the predicted 3D model.