| CPC B25J 9/161 (2013.01) [B25J 9/1697 (2013.01); B25J 19/023 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06T 7/55 (2017.01); G06T 7/75 (2017.01); G06V 20/64 (2022.01)] | 16 Claims |

|
1. A method to implement a visual object parsing system, the method comprising:
generating a first neural network system comprising a plurality of first neural networks, each of the plurality of first neural networks having a first neural network input, a first neural network output, and a unique set of first neural network weights;
generating a second neural network system comprising a plurality of second neural networks, each of the plurality of second neural networks having a second neural network input, a second neural network output, and a unique set of second neural network weights;
training the plurality of first neural networks to each output a first set of point cloud mappings at the respective first neural network output, wherein the first neural networks are trained with a first training dataset comprising a set of image information representing a view of a scene and a set of visual movement information associated with the view of the scene;
selecting one of the first neural networks based on a first set of criteria;
coupling the first neural network output of the selected first neural network to an input of a spatial memory system, wherein the spatial memory system integrates the first set of point cloud mappings outputted by the selected first neural network over time into a second set of point cloud mappings using position and orientation information associated with the view of the scene;
training the plurality of second neural networks to each output a set of object clusters at the respective second neural network output, wherein the second neural networks are trained with a second training dataset comprising the second set of point cloud mappings outputted by the spatial memory system; and
selecting one of the second neural networks based on a second set of criteria, wherein an object parsing output representing at least one three-dimensional object is generated based at least in part on the set of object clusters outputted by the selected second neural network.
|