CPC G05B 15/02 (2013.01) [G08B 31/00 (2013.01)] | 7 Claims |
1. An operation rule determination device comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions to:
obtain a state of a control target after each operation and a reward associated with the state for a series of operations on the control target, by using reward information associating the state with the reward corresponding to the state;
calculate a cumulative reward obtained by accumulating the obtained reward for the series of operations;
when the cumulative degree satisfies a condition, reduce the reward associated with the state after the series of operations in the reward information;
calculate the cumulative reward for a plurality of the series of operations;
obtain a frequency of the cumulative reward calculated for the plurality of the series of operations; and
determine the condition using the obtained frequency.
|