| CPC G06N 20/00 (2019.01) [G06F 9/46 (2013.01); G06F 16/2379 (2019.01); G06N 3/006 (2013.01); G06N 3/098 (2023.01); G06N 5/043 (2013.01)] | 12 Claims |

|
1. A computer-implemented method for exploring, by a table-based parallel reinforcement learning, PRL, algorithm, an unexplored domain comprising a plurality of agents and states, the unexplored domain represented by a state-action space, the method comprising the following steps performed by one or more of the plurality of agents:
receiving an assigned partition of the state-action space represented by a table; and
executing during a plurality of episodes actions for states within the partition, wherein an action transits a state; and
granting to a transited state a reward; and
exchanging state-action values with other agents of the plurality of agents in the domain;
updating the table;
subdividing the partition into subpartitions based on the number of agents, wherein a subpartition comprises a subset of one or more states, by ordering the one or more states of the subpartitions based on a number of times a respective state is transited to in a descending order; and
deriving a local affinity policy based on actions transiting the agent to states within its respective partition.
|