US 12,106,200 B2
	Unsupervised detection of intermediate reinforcement learning goals
Pierre Sermanet, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 13, 2023, as Appl. No. 18/168,000.
Application 18/168,000 is a continuation of application No. 16/347,651, granted, now 11,580,360, previously published as PCT/US2017/060216, filed on Nov. 6, 2017.
Claims priority of provisional application 62/418,122, filed on Nov. 4, 2016.
Prior Publication US 2023/0196058 A1, Jun. 22, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/006 (2023.01); G06F 18/21 (2023.01); G06F 18/2111 (2023.01); G06F 18/2132 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/006 (2013.01) [G06F 18/2111 (2023.01); G06F 18/2132 (2023.01); G06F 18/217 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)]

20 Claims

1. A computed-implemented method for training an agent to perform a reinforcement learning task, the method comprising:

obtaining a plurality of demonstration sequences, each of the demonstration sequences being a sequence of images of an environment while a respective instance of the reinforcement learning task is being performed;

for each demonstration sequence, processing each image in the demonstration sequence through an image processing neural network comprising a plurality of hidden layers to determine feature values for a respective set of features for the image from activations generated by one or more of the hidden layers;

determining, from the demonstration sequences, a partitioning of the reinforcement learning task into a plurality of subtasks, wherein each image in each demonstration sequence is assigned to a respective subtask of the plurality of subtasks;

determining, from the feature values for the images in the demonstration sequences, a respective set of discriminative features for each of the plurality of subtasks; and

training the agent to perform the reinforcement learning task using one or more perception-based rewards computed from image feature values for the discriminative features for the subtasks.