| CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G06F 17/18 (2013.01)] | 17 Claims |

|
1. A computer-implemented method for performing Learning from Demonstrations based on data associated with a source domain, the method comprising:
performing, by a demonstrator, physical actions;
recording, during the performing of the physical actions by the demonstrator, the physical actions of the demonstrator, using sensors of the demonstrator and/or sensors of at least one spectator;
determining first data characterizing the demonstrator of the source domain, wherein the first data characterizes sensor data of the demonstrator and/or sensor data of the at least one spectator observing the demonstrator;
determining first knowledge from the source domain based on the first data;
transferring at least a part of the first knowledge to a second domain;
determining a conditional probability distribution over actions given an observation in the second domain, such that a target agent associated with the second domain behaves similar to the demonstrator of the source domain;
modeling the source domain using a first directed acyclic graph (DAG), and/or modeling the second domain using a second DAG; and
performing: a) characterizing one or more aspects of at least one of the first DAG and the second DAG with the equation:
![]() wherein PS(z, a, yS) characterizes a joint probability distribution in the source domain of an outcome z related to an action a and the spectator's observation in the source domain yS, where s represents the source domain, wherein
characterizes a sum operator, in a case of discrete domains, or an integral operator, in a case of continuous domains, wherein PS(yS|x) characterizes a conditional probability distribution in the source domain of the spectator's observation in the source domain yS given a state x, and PS(z, a, x) characterizes a joint probability distribution in the source domain of the outcome z related to the action a and the state x, wherein P(z|a, x) characterizes a conditional probability distribution of the outcome z given the action a and the state x, wherein πD(a|YD) characterizes a policy of the demonstrator, wherein the policy of the demonstrator is a conditional distribution of the action a given an input of the demonstrator YD, where D represents the demonstrator, and wherein PS(YD, x) characterizes a joint probability distribution in the source domain of the input of the demonstrator YD and the state x. |