US 12,260,302 B2
	Method and apparatus for performing learning from demonstrations, particularly imitation learning
Philipp Geiger, Leonberg (DE); and Seyed Jalal Etesami, Lausanne (CH)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Dec. 11, 2020, as Appl. No. 17/119,523.
Claims priority of application No. 19218664 (EP), filed on Dec. 20, 2019.
Prior Publication US 2021/0192391 A1, Jun. 24, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/901 (2019.01); G06F 17/18 (2006.01)

CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G06F 17/18 (2013.01)]

17 Claims

1. A computer-implemented method for performing Learning from Demonstrations based on data associated with a source domain, the method comprising:

performing, by a demonstrator, physical actions;

recording, during the performing of the physical actions by the demonstrator, the physical actions of the demonstrator, using sensors of the demonstrator and/or sensors of at least one spectator;

determining first data characterizing the demonstrator of the source domain, wherein the first data characterizes sensor data of the demonstrator and/or sensor data of the at least one spectator observing the demonstrator;

determining first knowledge from the source domain based on the first data;

transferring at least a part of the first knowledge to a second domain;

determining a conditional probability distribution over actions given an observation in the second domain, such that a target agent associated with the second domain behaves similar to the demonstrator of the source domain;

modeling the source domain using a first directed acyclic graph (DAG), and/or modeling the second domain using a second DAG; and

performing: a) characterizing one or more aspects of at least one of the first DAG and the second DAG with the equation:

wherein P_S(z, a, y_S) characterizes a joint probability distribution in the source domain of an outcome z related to an action a and the spectator's observation in the source domain y_S, where s represents the source domain, wherein custom character

characterizes a sum operator, in a case of discrete domains, or an integral operator, in a case of continuous domains, wherein P_S(y_S|x) characterizes a conditional probability distribution in the source domain of the spectator's observation in the source domain y_Sgiven a state x, and P_S(z, a, x) characterizes a joint probability distribution in the source domain of the outcome z related to the action a and the state x, wherein P(z|a, x) characterizes a conditional probability distribution of the outcome z given the action a and the state x, wherein π_D(a|Y_D) characterizes a policy of the demonstrator, wherein the policy of the demonstrator is a conditional distribution of the action a given an input of the demonstrator Y_D, where D represents the demonstrator, and wherein P_S(Y_D, x) characterizes a joint probability distribution in the source domain of the input of the demonstrator Y_Dand the state x.