US 11,836,590 B2
User intent classification using a multi-agent reinforcement learning framework
Puneet Mehta, San Mateo, CA (US); Shobhit Agrawal, Gurugram (IN); and Nishant Pandey, Gurugram (IN)
Assigned to AI Netomi, Inc., San Mateo, CA (US)
Filed by AI Netomi, Inc., San Mateo, CA (US)
Filed on Dec. 3, 2019, as Appl. No. 16/702,039.
Claims priority of provisional application 62/774,790, filed on Dec. 3, 2018.
Prior Publication US 2020/0184383 A1, Jun. 11, 2020
Int. Cl. G06N 20/20 (2019.01); G06F 40/284 (2020.01); G06N 5/043 (2023.01); G06N 3/04 (2023.01); G06F 40/30 (2020.01)
CPC G06N 20/20 (2019.01) [G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 3/04 (2013.01); G06N 5/043 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A system for training machine agents to determine a user intent expressed in a document, comprising:
a plurality of agents, wherein each agent is configured to (i) provide, upon receiving a current state and a current token, the token being one of a plurality of portions extracted from the document, a current prediction of the user intent based on a policy that acts on the current state and the current token, (ii) receive a current metric in response to providing the current prediction, and (iii) modify the policy based on one or more of the current metrics already received; and
an environment element that extracts the tokens from the document, wherein the environment element is configured to (i) determine a next state and a next token from the extracted tokens; (ii) provide to each agent the next state as the current state and the next token as the current token, (iii) from each of the agents, receive that agent's current prediction of the user intent, and (iv) to each of the agents, provide the current metric to that agent based on comparing that agent's current prediction against a predetermined intent, and wherein the environment element determines each next state, other than an initial state, based on the current predictions of the agents, one or one or more of the current states and all the current tokens already provided to the agents.