US 12,093,001 B2
Operation rule determination device, method, and recording medium using frequency of a cumulative reward calculated for series of operations
Takuya Hiraoka, Tokyo (JP); and Takashi Onishi, Tokyo (JP)
Assigned to NEC CORPORATION, Tokyo (JP)
Appl. No. 17/611,694
Filed by NEC Corporation, Tokyo (JP)
PCT Filed May 22, 2019, PCT No. PCT/JP2019/020324
§ 371(c)(1), (2) Date Nov. 16, 2021,
PCT Pub. No. WO2020/235061, PCT Pub. Date Nov. 26, 2020.
Prior Publication US 2022/0197230 A1, Jun. 23, 2022
Int. Cl. G05B 13/04 (2006.01); G05B 15/02 (2006.01); G08B 31/00 (2006.01)
CPC G05B 15/02 (2013.01) [G08B 31/00 (2013.01)] 7 Claims
OG exemplary drawing
 
1. An operation rule determination device comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions to:
obtain a state of a control target after each operation and a reward associated with the state for a series of operations on the control target, by using reward information associating the state with the reward corresponding to the state;
calculate a cumulative reward obtained by accumulating the obtained reward for the series of operations;
when the cumulative degree satisfies a condition, reduce the reward associated with the state after the series of operations in the reward information;
calculate the cumulative reward for a plurality of the series of operations;
obtain a frequency of the cumulative reward calculated for the plurality of the series of operations; and
determine the condition using the obtained frequency.