US 12,380,360 B2
	Interpretable imitation learning via prototypical option discovery for decision making
Wenchao Yu, Plainsboro, NJ (US); Haifeng Chen, West Windsor, NJ (US); and Wei Cheng, Princeton Junction, NJ (US)
Assigned to NEC Corporation, Tokyo (JP)
Filed by NEC Laboratories America, Inc., Princeton, NJ (US)
Filed on May 18, 2021, as Appl. No. 17/323,475.
Claims priority of provisional application 63/029,754, filed on May 26, 2020.
Claims priority of provisional application 63/033,304, filed on Jun. 2, 2020.
Prior Publication US 2021/0374612 A1, Dec. 2, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/951 (2019.01); G06N 20/10 (2019.01); G06N 20/20 (2019.01)

CPC G06N 20/00 (2019.01) [G06F 16/951 (2019.01); G06N 20/10 (2019.01); G06N 20/20 (2019.01)]

18 Claims

1. A method for learning prototypical options for interpretable imitation learning, the method comprising:

initializing options by bottleneck state discovery, each of the options presented by an instance of trajectories generated by experts;

applying segmentation embedding learning to extract features to represent current states in segmentations by dividing the trajectories into a set of segmentations;

learning prototypical options for each segment of the set of segmentations to mimic expert policies by minimizing loss of a policy and projecting prototypes to the current states;

learning prototypical option embedding using an objective function:

where L_IL_{_loss}is an imitation learning loss, f_φ is, the second term is a segment representation function for a segment s_ν′_{_m}_,ν_{_m}from segment ν_mto segment ν_m′, e_iand e_jare embedded prototypes, K is a number of prototypes, M is a number of segments, d_minis a threshold value, and λ₁, λ₂, and λ₃, are weighting parameters;

training option policy with imitation learning techniques to learn a conditional policy;

generating interpretable policies by comparing the current states in the segmentations to one or more prototypical option embeddings;

generating dosage options for a patient based on the interpretable policies;

displaying the dosage options on a user interface for a user; and

taking an action based on the dosage options.