US 12,488,278 B2
Interactive agent and control using reinforcement learning
Katja Hofmann, Cambridge (GB); Luisa Maria Zintgraf, Oxford (GB); Sam Michael Devlin, Trumpington (GB); and Kamil Andrzej Ciosek, Cambridge (GB)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Oct. 9, 2020, as Appl. No. 17/067,284.
Prior Publication US 2022/0114474 A1, Apr. 14, 2022
Int. Cl. G06N 20/00 (2019.01); G06N 3/006 (2023.01); G06N 3/092 (2023.01); G06N 5/04 (2023.01); G06N 3/02 (2006.01); G06Q 10/101 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 3/006 (2013.01); G06N 3/092 (2023.01); G06N 5/04 (2013.01); G06N 3/02 (2013.01); G06Q 10/101 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by a computer-implemented agent in an environment, the method comprising:
storing a reinforcement learning policy for use by the computer-implemented agent to act in the environment;
storing a distribution as a latent representation of a belief of the computer-implemented agent about another agent in the environment, wherein the latent representation is computed from an input of an observation made by the computer-implemented agent of a behavior of the other agent in the environment, wherein the latent representation comprises a permanent component denoting permanent behaviors of the other agent and a temporal component denoting temporal behaviors of the other agent, and wherein the latent representation includes a hierarchical structure wherein the temporal component of the other agent depends on the permanent component of the other agent, the temporal component being generated from a first layer of an encoder, the permanent component being generated from a second layer of the encoder, wherein the first layer is deeper in the hierarchical structure than the second layer; and
executing the computer-implemented agent according to the policy conditioned on parameters characterizing the distribution.