US 11,941,088 B1
Image processing of an environment to select an action to be performed by an agent interacting with the environment
Volodymyr Mnih, Toronto (CA); and Koray Kavukcuoglu, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on May 5, 2022, as Appl. No. 17/737,544.
Application 17/737,544 is a continuation of application No. 16/927,159, filed on Jul. 13, 2020, granted, now 11,354,548.
Application 16/927,159 is a continuation of application No. 16/250,320, filed on Jan. 17, 2019, granted, now 10,748,041, issued on Aug. 18, 2020.
Application 16/250,320 is a continuation of application No. 14/731,348, filed on Jun. 4, 2015, granted, now 10,223,617, issued on Mar. 5, 2019.
Claims priority of provisional application 62/009,097, filed on Jun. 6, 2014.
Int. Cl. G06V 10/44 (2022.01); G06F 18/2431 (2023.01); G06V 20/80 (2022.01); G06V 30/194 (2022.01); G06V 30/413 (2022.01)
CPC G06F 18/2431 (2023.01) [G06V 10/44 (2022.01); G06V 20/80 (2022.01); G06V 30/194 (2022.01); G06V 30/413 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for processing one or more images of an environment to select an action to be performed by an agent interacting with the environment, the method comprising:
at each iteration of a plurality of iterations:
receiving a current image for the iteration, wherein the current image is included in the one or more images;
determining a location in the current image, comprising:
determining the location based on an output of a location neural network for the current iteration if the current iteration is after a first iteration in the plurality of iterations;
extracting a glimpse from the current image using the location;
updating a current internal state of a recurrent neural network using the glimpse extracted from the current image to generate a new internal state, comprising:
generating a glimpse representation of the extracted glimpse, and
processing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;
processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the image to generate an output of the location neural network for a next iteration in the plurality of iterations; and
at a last iteration of the plurality of iterations, selecting the action to be performed by the agent, including:
processing, using an action neural network, the new internal state of the recurrent neural network at the last iteration to generate a respective action score for each action in a set of actions; and
selecting the action to be performed by the agent using the action scores;
wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.