US 12,260,302 B2
Method and apparatus for performing learning from demonstrations, particularly imitation learning
Philipp Geiger, Leonberg (DE); and Seyed Jalal Etesami, Lausanne (CH)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Dec. 11, 2020, as Appl. No. 17/119,523.
Claims priority of application No. 19218664 (EP), filed on Dec. 20, 2019.
Prior Publication US 2021/0192391 A1, Jun. 24, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/901 (2019.01); G06F 17/18 (2006.01)
CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G06F 17/18 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for performing Learning from Demonstrations based on data associated with a source domain, the method comprising:
performing, by a demonstrator, physical actions;
recording, during the performing of the physical actions by the demonstrator, the physical actions of the demonstrator, using sensors of the demonstrator and/or sensors of at least one spectator;
determining first data characterizing the demonstrator of the source domain, wherein the first data characterizes sensor data of the demonstrator and/or sensor data of the at least one spectator observing the demonstrator;
determining first knowledge from the source domain based on the first data;
transferring at least a part of the first knowledge to a second domain;
determining a conditional probability distribution over actions given an observation in the second domain, such that a target agent associated with the second domain behaves similar to the demonstrator of the source domain;
modeling the source domain using a first directed acyclic graph (DAG), and/or modeling the second domain using a second DAG; and
performing: a) characterizing one or more aspects of at least one of the first DAG and the second DAG with the equation:

OG Complex Work Unit Math
wherein PS(z, a, yS) characterizes a joint probability distribution in the source domain of an outcome z related to an action a and the spectator's observation in the source domain yS, where s represents the source domain, wherein custom character characterizes a sum operator, in a case of discrete domains, or an integral operator, in a case of continuous domains, wherein PS(yS|x) characterizes a conditional probability distribution in the source domain of the spectator's observation in the source domain yS given a state x, and PS(z, a, x) characterizes a joint probability distribution in the source domain of the outcome z related to the action a and the state x, wherein P(z|a, x) characterizes a conditional probability distribution of the outcome z given the action a and the state x, wherein πD(a|YD) characterizes a policy of the demonstrator, wherein the policy of the demonstrator is a conditional distribution of the action a given an input of the demonstrator YD, where D represents the demonstrator, and wherein PS(YD, x) characterizes a joint probability distribution in the source domain of the input of the demonstrator YD and the state x.