| CPC G06N 20/00 (2019.01) [G06N 3/006 (2013.01); G06N 3/092 (2023.01); G06N 5/04 (2013.01); G06N 3/02 (2013.01); G06Q 10/101 (2013.01)] | 20 Claims |

|
1. A method performed by a computer-implemented agent in an environment, the method comprising:
storing a reinforcement learning policy for use by the computer-implemented agent to act in the environment;
storing a distribution as a latent representation of a belief of the computer-implemented agent about another agent in the environment, wherein the latent representation is computed from an input of an observation made by the computer-implemented agent of a behavior of the other agent in the environment, wherein the latent representation comprises a permanent component denoting permanent behaviors of the other agent and a temporal component denoting temporal behaviors of the other agent, and wherein the latent representation includes a hierarchical structure wherein the temporal component of the other agent depends on the permanent component of the other agent, the temporal component being generated from a first layer of an encoder, the permanent component being generated from a second layer of the encoder, wherein the first layer is deeper in the hierarchical structure than the second layer; and
executing the computer-implemented agent according to the policy conditioned on parameters characterizing the distribution.
|