US 11,657,333 B1
Interpretability of deep reinforcement learning models in assistant systems
Honglei Liu, San Mateo, CA (US); Pararth Paresh Shah, Sunnyvale, CA (US); Wenxuan Li, Mountain View, CA (US); Wenhai Yang, Mountain View, CA (US); and Anuj Kumar, Santa Clara, CA (US)
Assigned to Meta Platforms Technologies, LLC, Menlo Park, CA (US)
Filed by Meta Platforms Technologies, LLC, Menlo Park, CA (US)
Filed on Apr. 19, 2019, as Appl. No. 16/389,769.
Claims priority of provisional application 62/750,746, filed on Oct. 25, 2018.
Claims priority of provisional application 62/660,876, filed on Apr. 20, 2018.
Int. Cl. G06N 20/20 (2019.01); G06Q 50/00 (2012.01); G06N 3/08 (2006.01); G06F 18/214 (2023.01); G06N 3/045 (2023.01)
CPC G06N 20/20 (2019.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06Q 50/01 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising, by one or more computing systems:
training a target machine-learning model iteratively by:
accessing a plurality of training data, wherein each training data comprises a content object;
training an intermediate machine-learning model based on the plurality of training data, wherein the intermediate machine-learning model outputs one or more contextual evaluation measurements;
generating a plurality of state-indications associated with the plurality of training data, respectively, wherein the plurality of state-indications comprise one or more user-intents, one or more system actions, and one or more user actions;
training the target machine-learning model based on the one or more contextual evaluation measurements, the plurality of state-indications, and an action set comprising a plurality of possible system actions;
extracting, by a sequential pattern-mining model, a plurality of rules based on the target machine-learning model;
generating, based on the plurality of rules, a plurality of synthetic training data;
updating the plurality of training data by adding the plurality of synthetic training data to the plurality of training data;
determining if a completion condition is reached for the training of the target machine-learning model; and
based on the determining:
if the completion condition is reached, then returning the target machine-learning model; else
if the completion condition is not reached, then repeating the iterative training of the target machine-learning model.