US 12,263,590 B2
Machine vision parsing of three-dimensional environments employing neural networks
Rodrigo Furlan, Delta (CA)
Assigned to Sanctuary Cognitive Systems Corporation, Vancouver (CA)
Filed by Sanctuary Cognitive Systems Corporation, Vancouver (CA)
Filed on Oct. 1, 2020, as Appl. No. 17/061,187.
Claims priority of provisional application 62/927,485, filed on Oct. 29, 2019.
Prior Publication US 2021/0122035 A1, Apr. 29, 2021
Int. Cl. G06N 3/04 (2023.01); B25J 9/16 (2006.01); B25J 19/02 (2006.01); G06N 3/045 (2023.01); G06T 7/55 (2017.01); G06T 7/73 (2017.01); G06V 20/64 (2022.01)
CPC B25J 9/161 (2013.01) [B25J 9/1697 (2013.01); B25J 19/023 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06T 7/55 (2017.01); G06T 7/75 (2017.01); G06V 20/64 (2022.01)] 16 Claims
OG exemplary drawing
 
1. A method to implement a visual object parsing system, the method comprising:
generating a first neural network system comprising a plurality of first neural networks, each of the plurality of first neural networks having a first neural network input, a first neural network output, and a unique set of first neural network weights;
generating a second neural network system comprising a plurality of second neural networks, each of the plurality of second neural networks having a second neural network input, a second neural network output, and a unique set of second neural network weights;
training the plurality of first neural networks to each output a first set of point cloud mappings at the respective first neural network output, wherein the first neural networks are trained with a first training dataset comprising a set of image information representing a view of a scene and a set of visual movement information associated with the view of the scene;
selecting one of the first neural networks based on a first set of criteria;
coupling the first neural network output of the selected first neural network to an input of a spatial memory system, wherein the spatial memory system integrates the first set of point cloud mappings outputted by the selected first neural network over time into a second set of point cloud mappings using position and orientation information associated with the view of the scene;
training the plurality of second neural networks to each output a set of object clusters at the respective second neural network output, wherein the second neural networks are trained with a second training dataset comprising the second set of point cloud mappings outputted by the spatial memory system; and
selecting one of the second neural networks based on a second set of criteria, wherein an object parsing output representing at least one three-dimensional object is generated based at least in part on the set of object clusters outputted by the selected second neural network.