US 12,340,283 B2
Exploring an unexplored domain by parallel reinforcement
Maxim Claeys, Eeklo (BE); Miguel Camelo, Wilrijk (BE); and Steven Latre, Lokeren (BE)
Assigned to IMEC VZW, Leuven (BE); and UNIVERSITEIT ANTWERPEN, Antwerp (BE)
Appl. No. 17/283,376
Filed by IMEC VZW, Leuven (BE); and UNIVERSITEIT ANTWERPEN, Antwerp (BE)
PCT Filed Oct. 11, 2019, PCT No. PCT/EP2019/077563
§ 371(c)(1), (2) Date Apr. 7, 2021,
PCT Pub. No. WO2020/074689, PCT Pub. Date Apr. 16, 2020.
Claims priority of application No. 18200069 (EP), filed on Oct. 12, 2018.
Prior Publication US 2021/0383273 A1, Dec. 9, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 9/46 (2006.01); G06F 16/23 (2019.01); G06N 3/006 (2023.01); G06N 3/098 (2023.01); G06N 5/043 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 9/46 (2013.01); G06F 16/2379 (2019.01); G06N 3/006 (2013.01); G06N 3/098 (2023.01); G06N 5/043 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A computer-implemented method for exploring, by a table-based parallel reinforcement learning, PRL, algorithm, an unexplored domain comprising a plurality of agents and states, the unexplored domain represented by a state-action space, the method comprising the following steps performed by one or more of the plurality of agents:
receiving an assigned partition of the state-action space represented by a table; and
executing during a plurality of episodes actions for states within the partition, wherein an action transits a state; and
granting to a transited state a reward; and
exchanging state-action values with other agents of the plurality of agents in the domain;
updating the table;
subdividing the partition into subpartitions based on the number of agents, wherein a subpartition comprises a subset of one or more states, by ordering the one or more states of the subpartitions based on a number of times a respective state is transited to in a descending order; and
deriving a local affinity policy based on actions transiting the agent to states within its respective partition.