US 12,229,223 B2
	Agent environment co-creation using reinforcement learning
Robert P. Dooley, Dublin, CA (US); and Dylan James Snow, Concord, CA (US)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Jul. 2, 2020, as Appl. No. 16/919,388.
Prior Publication US 2022/0004814 A1, Jan. 6, 2022
Int. Cl. G06F 9/451 (2018.01); G06F 18/21 (2023.01); G06F 18/214 (2023.01); G06F 18/2413 (2023.01)

CPC G06F 18/2413 (2023.01) [G06F 9/453 (2018.02); G06F 18/2148 (2023.01); G06F 18/2163 (2023.01); G06F 18/217 (2023.01)]

20 Claims

1. A computer-implemented method, comprising:

receiving a request that includes an untrained virtual agent from a requestor, wherein the untrained virtual agent is a neural network;

obtaining an environment definition that is used to programmatically generate synthesized training environments for training agents to perform a skill that includes multiple intermediate skills, the synthesized training environments being programmatically generated based on production rules that, for each of multiple training environment complexities, define objects and relationships between the objects, and the production rules for different training environment complexities having different reward functions that reward agents for successfully completing different intermediate skills;

obtaining, from the environment definition, initial production rules for an initial training complexity that is associated with an initial reward function that rewards virtual agents for successfully completing an initial intermediate skill;

generating an initial synthesized training environment using the initial production rules for the initial training complexity;

training, using reinforcement learning, the neural network corresponding to the untrained virtual agent to perform a particular task in the initial synthesized training environment;

determining a training success rate of the virtual agent in performing the particular task in the initial synthesized training environment using an initial reward function associated with completing the initials intermediate skill;

before the virtual agent has completed the initial intermediate skill, determining that the training success rate of the virtual agent in performing the particular task in the initial synthesized training environment satisfies criteria associated with increasing or decreasing training environment complexity to provide subsequent training complexity, wherein determining that the training success rate satisfies the criteria includes:

determining that the training success rate satisfies the criteria associated with the increasing training environment complexity when the training success rate is greater than or equal to a threshold rate, or

determining that the training success rate satisfies the criteria associated with the decreasing training environment complexity when the training success rate is lesser than the threshold rate, wherein decreasing the training environment complexity for each production rule is based on one or more of:

a midpoint between a current value for the production rule and a last value for the lowest complexity for the production rule from which complexity being increased, and

a half of the current value and the value for the lowest complexity for the production rule;

obtaining subsequent production rules for the subsequent training complexity, that is greater or lesser than the initial training complexity, based on the determination of the training success rate associated with the increasing or decreasing training complexity and that is associated with a reward function that rewards virtual agents for successfully completing the initial intermediate skill and one or more different intermediate skills;

generating a subsequent synthesized training environment using the subsequent production rules for the subsequent synthesized training environment complexity:

training, using the reinforcement learning, the neural network corresponding to the virtual agent that has not completed the initial intermediate skill to perform the particular task in the subsequent synthesized training environment;

determining a training success rate of the virtual agent in performing the particular task in the subsequent synthesized training environment using the reward function associated with successfully completing the initial intermediate skill and the one or more different intermediate skills;

determining that the training success rate of the virtual agent in performing the initial intermediate skill and the one more different intermediate skills that are associated with the particular task in the subsequent synthesized training environment satisfies criteria associated with completing training; and

providing the virtual agent that was trained in the subsequent synthesized training environment to the requestor in response to the request.