US 11,900,233 B2
	Method and system for interactive imitation learning in video games
Arthur William Juliani, Jr., San Francisco, CA (US); and Mohamed Marwan A. Mattar, San Francisco, CA (US)
Assigned to Unity IPR ApS, Copenhagen (DK)
Filed by Unity IPR ApS, Copenhagen (DK)
Filed on May 26, 2022, as Appl. No. 17/826,050.
Application 17/826,050 is a continuation of application No. 16/657,868, filed on Oct. 18, 2019, granted, now 11,369,879.
Claims priority of provisional application 62/747,471, filed on Oct. 18, 2018.
Prior Publication US 2022/0355205 A1, Nov. 10, 2022
Int. Cl. G06N 3/006 (2023.01); A63F 13/67 (2014.01); G06N 20/00 (2019.01); G06F 18/214 (2023.01); G06V 10/764 (2022.01); G06N 3/02 (2006.01)

CPC G06N 3/006 (2013.01) [A63F 13/67 (2014.09); G06F 18/214 (2023.01); G06N 20/00 (2019.01); G06V 10/764 (2022.01); G06N 3/02 (2013.01)]

17 Claims

1. A system comprising:

one or more computer processors;

one or more computer memories; and

a set of instructions incorporated into the one or more computer memories, the set of instructions configuring the one or more computer processors to perform operations, the operations comprising:

receiving first input data from one or more input devices, the first input data describing a first set of example actions;

using inverse reinforcement learning to estimate a reward function for the first set of example actions;

using the reward function and the first set of example actions as inputs to a reinforcement learning model to train a machine learning agent;

analyzing a set of performed actions performed by the trained machine learning agent to determine a measure of failure of the training of the machine learning agent, wherein the analyzing includes analyzing a label identifying an action of the set of performed actions as a positive action or a negative action;

based on the measure of failure reaching a threshold, pausing the machine learning agent and requesting a second set of example actions from the one or more input devices;

using the second set of example actions in addition to the first set of example actions to estimate a new reward function; and

using the new reward function, the first set of example actions, and the second set of example actions as additional inputs to the reinforcement learning model to train the machine learning agent.