US 12,031,814 B2
System for interacting with machines using natural language input
Shiwali Mohan, Palo Alto, CA (US); Matthew Klenk, San Francisco, CA (US); Matthew Shreve, Campbell, CA (US); Aaron Ang, Mountain View, CA (US); John Turner Maxwell, III, Santa Clara, CA (US); and Kent Evans, Cupertino, CA (US)
Assigned to Xerox Corporation, Norwalk, CT (US)
Filed by PALO ALTO RESEARCH CENTER INCORPORATED, Palo Alto, CA (US)
Filed on Nov. 3, 2021, as Appl. No. 17/518,429.
Claims priority of provisional application 63/231,682, filed on Aug. 10, 2021.
Prior Publication US 2023/0048827 A1, Feb. 16, 2023
Int. Cl. G01B 7/00 (2006.01); B25J 9/16 (2006.01); B25J 13/00 (2006.01); B25J 13/08 (2006.01); G01B 7/02 (2006.01); G05B 15/02 (2006.01); G06F 18/2137 (2023.01); G06F 40/10 (2020.01); G06F 40/40 (2020.01)
CPC G01B 7/003 (2013.01) [B25J 9/163 (2013.01); B25J 9/1664 (2013.01); B25J 13/003 (2013.01); B25J 13/089 (2013.01); G01B 7/023 (2013.01); G05B 15/02 (2013.01); G06F 18/21375 (2023.01); G06F 40/10 (2020.01); G06F 40/40 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
obtaining sensor data from a set of imaging sensor devices, wherein the sensor data comprises image data indicative of a set of objects detected within an environment in proximity to a set of mechanical systems configured to interact with the set of objects;
generating a state graph based on the sensor data, wherein:
the state graph represents the set of objects and a set of positions of the set of objects within the environment;
the state graph comprises a set of object nodes to represent the set of objects; and
the state graph comprises a set of property nodes to represent a set of properties of the set of objects;
obtaining user input data, wherein the user input data is generated based on a natural language input; and
updating the state graph based on the user input data to generate an enhanced state graph, wherein the enhanced state graph comprises the set of object nodes, the set of property nodes, and a goal node generated based on the user input data, wherein the goal node indicates an objective;
generating a set of instructions for the set of mechanical systems based on the enhanced state graph; and
operating the set of mechanical systems to interact with one or more of the set of objects within the environment based on the set of instructions;
after operating the set of mechanical systems, obtaining additional sensor data from the set of imaging sensor devices and updating the set of property nodes of the enhanced state graph to generate an updated enhanced state graph based on the additional sensor data to indicate changes to the set of properties of the set of objects caused by the set of mechanical systems;
determining whether the objective has been met based on the updated enhanced state graph; and
responsive to determining that the objective has not been met, generating a new set of instructions for the set of mechanical systems based on the updated enhanced state graph.
 
13. An apparatus, comprising:
a memory to store a binary code of a controller; and
a processing device, operatively coupled to the memory, to:
obtain sensor data from a set of imaging sensor devices, wherein the sensor data comprises image data indicative of a set of objects detected within an environment in proximity to a set of mechanical systems configured to interact with the set of objects;
generate a state graph based on the sensor data, wherein:
the state graph represents the set of objects and a set of positions of the set of objects within the environment;
the state graph comprises a set of object nodes to represent the set of objects; and
the state graph comprises a set of property nodes to represent a set of properties of the set of objects;
obtain user input data, wherein the user input data is generated based on a natural language input; and
update the state graph based on the user input data to generate an enhanced state graph, wherein the enhanced state graph comprises the set of object nodes, the set of property nodes, and a goal node generated based on the user input data, wherein the goal node indicates an objective;
generate a set of instructions for the set of mechanical systems based on the enhanced state graph; and
operate the set of mechanical systems to interact with one or more of the set of objects within the environment based on the set of instructions;
after operating the set of mechanical systems, obtain additional sensor data from the set of imaging sensor devices and update the set of property nodes of the enhanced state graph to generate an updated enhanced state graph based on the additional sensor data to indicate changes to the set of properties of the set of objects caused by the set of mechanical systems;
determine whether the objective has been met based on the updated enhanced state graph; and
responsive to determining that the objective has not been met, generate a new set of instructions for the set of mechanical systems based on the updated enhanced state graph.
 
20. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed by a processing device, cause the processing device to:
obtain sensor data from a set of imaging sensor devices, wherein the sensor data comprises image data indicative of a set of objects detected within an environment in proximity to a set of mechanical systems configured to interact with the set of objects;
generate a state graph based on the sensor data, wherein:
the state graph represents the set of objects and a set of positions of the set of objects within the environment;
the state graph comprises a set of object nodes to represent the set of objects; and
the state graph comprises a set of property nodes to represent a set of properties of the set of objects;
obtain user input data, wherein the user input data is generated based on a natural language input; and
update the state graph based on the user input data to generate an enhanced state graph, wherein the enhanced state graph comprises the set of object nodes, the set of property nodes, and a goal node generated based on the user input data, wherein the goal node indicates an objective;
generate a set of instructions for the set of mechanical systems based on the enhanced state graph; and
operate the set of mechanical systems to interact with one or more of the set of objects within the environment based on the set of instructions;
after operating the set of mechanical systems, obtain additional sensor data from the set of imaging sensor devices and update the set of property nodes of the enhanced state graph to generate an updated enhanced state graph based on the additional sensor data to indicate changes to the set of properties of the set of objects caused by the set of mechanical systems;
determine whether the objective has been met based on the updated enhanced state graph; and
responsive to determining that the objective has not been met, generate a new set of instructions for the set of mechanical systems based on the updated enhanced state graph.