CPC B25J 9/163 (2013.01) | 7 Claims |
1. A method for training a control strategy, comprising the following steps:
providing training data, which demonstrate a control behavior, according to which control actions are to be generated; and
training the control strategy using imitation learning by minimizing a measure of deviation between a distribution of state transitions according to the control strategy and a distribution of state transitions according to the demonstrated control behavior using the training data, the measure of deviation being minimized by applying a maximal entropy reinforcement learning method to a reward, whose shared maximization with an entropy of the control strategy corresponds to a minimization of the measure of deviation;
wherein a training of a conditioned normalizing flow probability model for the transition distribution and of a conditioned normalizing flow probability model for the inverse action distribution and minimization of the measure of deviation using the trained conditioned normalizing flow probability models take place, and the reward is estimated using the trained conditioned normalizing flow probability model for the transition distribution and the trained conditioned normalizing flow probability model for the inverse action distribution.
|