US 12,380,360 B2
Interpretable imitation learning via prototypical option discovery for decision making
Wenchao Yu, Plainsboro, NJ (US); Haifeng Chen, West Windsor, NJ (US); and Wei Cheng, Princeton Junction, NJ (US)
Assigned to NEC Corporation, Tokyo (JP)
Filed by NEC Laboratories America, Inc., Princeton, NJ (US)
Filed on May 18, 2021, as Appl. No. 17/323,475.
Claims priority of provisional application 63/029,754, filed on May 26, 2020.
Claims priority of provisional application 63/033,304, filed on Jun. 2, 2020.
Prior Publication US 2021/0374612 A1, Dec. 2, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/951 (2019.01); G06N 20/10 (2019.01); G06N 20/20 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 16/951 (2019.01); G06N 20/10 (2019.01); G06N 20/20 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method for learning prototypical options for interpretable imitation learning, the method comprising:
initializing options by bottleneck state discovery, each of the options presented by an instance of trajectories generated by experts;
applying segmentation embedding learning to extract features to represent current states in segmentations by dividing the trajectories into a set of segmentations;
learning prototypical options for each segment of the set of segmentations to mimic expert policies by minimizing loss of a policy and projecting prototypes to the current states;
learning prototypical option embedding using an objective function:

OG Complex Work Unit Math
where LILloss is an imitation learning loss, fφ is, the second term is a segment representation function for a segment sν′mm from segment νm to segment νm′, ei and ej are embedded prototypes, K is a number of prototypes, M is a number of segments, dmin is a threshold value, and λ1, λ2, and λ3, are weighting parameters;
training option policy with imitation learning techniques to learn a conditional policy;
generating interpretable policies by comparing the current states in the segmentations to one or more prototypical option embeddings;
generating dosage options for a patient based on the interpretable policies;
displaying the dosage options on a user interface for a user; and
taking an action based on the dosage options.