US 12,462,185 B2
Scene grammar based reinforcement learning in agent training
Robert P. Dooley, Dublin, CA (US); and Dylan James Snow, Concord, CA (US)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Aug. 18, 2020, as Appl. No. 16/996,310.
Prior Publication US 2022/0058510 A1, Feb. 24, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 40/211 (2020.01); G06N 3/006 (2023.01); G06N 3/126 (2023.01); G06N 5/04 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 5/04 (2013.01)] 11 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
obtaining a first set of multiple scene grammars that programmatically generate synthesized training environments based on different probabilistic rules that define objects in a synthesized training environment or different relationships between the objects in the synthesized training environment;
generating a second set of scene grammars by (i) introducing random mutations in at least a first subset of the first set of multiple scene grammars or (ii) combining at least a second subset of the first set of multiple scene grammars with one another:
randomly generating multiple candidate synthesized training environments including generating a particular synthesized training environment using a particular scene grammar from among the second set of scene grammars;
initiating training multiple sample virtual agents in the multiple candidate synthesized training environments including training of a sample virtual agent in the particular synthesized training environment to perform a particular task;
while the sample virtual agent is being trained in the particular synthesized training environment to perform the particular task:
identifying a behavior of the sample virtual agent that is training in the particular synthesized training environment to perform the particular task;
obtaining a reference video of a human agent performing the task;
identifying a behavior of the reference human agent that is performing the task from the reference video; and
determining that the behavior of the sample agent that is training in the particular synthesized training environment matches the behavior of the reference human agent that is performing the task is in response to:
determining an amount of similarity between the movement of the reference human agent in the reference video and movement of the sample virtual agent in the particular synthesized training environment is greater than an amount of similarity between the movement of the reference human agent in the reference video and movement of the sample virtual agent in a different synthesized training environment generated by a different scene grammar,
providing a first score based on determining the amount of similarity between the movement of the reference human agent in the reference video and the movement of the sample virtual agent in the particular synthesized training environment,
providing a second score based on determining the amount of similarity between the movement of the reference human agent in the reference video and the movement of the sample virtual agent in the different synthesized training environment,
selecting the particular synthesized training environment over the different synthesized training environment based on determining the first score is greater the second score;
storing one or more indications that the synthesized training environment trains virtual agents to perform the particular task further comprises:
storing a first indication that associates (i) the particular scene grammar being used to generate the particular synthesized training environment with (ii) training virtual agents to perform the particular task,
wherein storing the first indication further comprises, determining an amount of similarity determined for the particular synthesized training environment is greater than an amount of similarity determined for a second particular synthesized training environment,
wherein the second particular synthesized training environment is one of the multiple candidates synthesized training environments generated from the second set of scene grammars environment; and
storing a second indication that associates (i) a second scene grammar being used to generate the second particular synthesized training environment with (ii) training virtual agents to perform the particular task,
wherein storing the second indication further comprises, determining an amount of similarity determined for the particular synthesized training environment and an amount of similarity determined for the second particular synthesized training environment both satisfying a selection criteria;
determining to train a new virtual agent to perform the particular task;
selecting the one or more indications associating one of, the particular scene grammar which generates the particular synthesized training environment with training virtual agents to perform the particular task or the second scene grammar which generates the second particular synthesized training environment with training virtual agents to perform the particular task;
generating a new synthesized training environment using the selected, one or more indications, wherein the new synthesized training environment includes one of, the particular synthesized training environment or the second particular synthesized training environment; and
initiating training the new virtual agent to perform the task using the new synthesized training environment.