US 12,226,905 B2
	Method for training a control strategy
Christoph-Nikolas Straehle, Weil der Stadt (DE); Damian Boborzi, Augsburg (DE); and Jens Stefan Buchner, Bietigheim-Bissingen (DE)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Sep. 2, 2022, as Appl. No. 17/902,258.
Claims priority of application No. 10 2021 209 761.8 (DE), filed on Sep. 6, 2021.
Prior Publication US 2023/0081738 A1, Mar. 16, 2023
Int. Cl. B25J 9/16 (2006.01)

CPC B25J 9/163 (2013.01)

7 Claims

1. A method for training a control strategy, comprising the following steps:

providing training data, which demonstrate a control behavior, according to which control actions are to be generated; and

training the control strategy using imitation learning by minimizing a measure of deviation between a distribution of state transitions according to the control strategy and a distribution of state transitions according to the demonstrated control behavior using the training data, the measure of deviation being minimized by applying a maximal entropy reinforcement learning method to a reward, whose shared maximization with an entropy of the control strategy corresponds to a minimization of the measure of deviation;

wherein a training of a conditioned normalizing flow probability model for the transition distribution and of a conditioned normalizing flow probability model for the inverse action distribution and minimization of the measure of deviation using the trained conditioned normalizing flow probability models take place, and the reward is estimated using the trained conditioned normalizing flow probability model for the transition distribution and the trained conditioned normalizing flow probability model for the inverse action distribution.