US 12,340,283 B2
	Exploring an unexplored domain by parallel reinforcement
Maxim Claeys, Eeklo (BE); Miguel Camelo, Wilrijk (BE); and Steven Latre, Lokeren (BE)
Assigned to IMEC VZW, Leuven (BE); and UNIVERSITEIT ANTWERPEN, Antwerp (BE)
Appl. No. 17/283,376
Filed by IMEC VZW, Leuven (BE); and UNIVERSITEIT ANTWERPEN, Antwerp (BE)
PCT Filed Oct. 11, 2019, PCT No. PCT/EP2019/077563 § 371(c)(1), (2) Date Apr. 7, 2021, PCT Pub. No. WO2020/074689, PCT Pub. Date Apr. 16, 2020.
Claims priority of application No. 18200069 (EP), filed on Oct. 12, 2018.
Prior Publication US 2021/0383273 A1, Dec. 9, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 9/46 (2006.01); G06F 16/23 (2019.01); G06N 3/006 (2023.01); G06N 3/098 (2023.01); G06N 5/043 (2023.01)

CPC G06N 20/00 (2019.01) [G06F 9/46 (2013.01); G06F 16/2379 (2019.01); G06N 3/006 (2013.01); G06N 3/098 (2023.01); G06N 5/043 (2013.01)]

12 Claims

1. A computer-implemented method for exploring, by a table-based parallel reinforcement learning, PRL, algorithm, an unexplored domain comprising a plurality of agents and states, the unexplored domain represented by a state-action space, the method comprising the following steps performed by one or more of the plurality of agents:

receiving an assigned partition of the state-action space represented by a table; and

executing during a plurality of episodes actions for states within the partition, wherein an action transits a state; and

granting to a transited state a reward; and

exchanging state-action values with other agents of the plurality of agents in the domain;

updating the table;

subdividing the partition into subpartitions based on the number of agents, wherein a subpartition comprises a subset of one or more states, by ordering the one or more states of the subpartitions based on a number of times a respective state is transited to in a descending order; and

deriving a local affinity policy based on actions transiting the agent to states within its respective partition.