| CPC G06F 40/35 (2020.01) [G06N 3/08 (2013.01); G06F 16/3329 (2019.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01)] | 20 Claims |

|
1. A method, comprising:
performing, by a computer system, an iterative training operation to train a deep Q-learning network (“DQN”) based on conversation log information corresponding to a plurality of prior conversations, wherein the DQN includes:
an input layer to receive an input value indicative of a current state of a given conversation;
one or more hidden layers; and
an output layer that includes a plurality of output nodes corresponding to a plurality of available responses;
wherein, for a first conversation log corresponding to a first one of the plurality of prior conversations, the iterative training operation includes:
determining a current state of the first prior conversation based on a first user utterance, wherein determining the current state of the first prior conversation includes, for the first user utterance, identifying a first cluster of user utterances from a plurality of clusters of user utterances, the plurality of clusters of user utterances associated with the plurality of prior conversations partitioned into the plurality of clusters;
generating a first input value to the DQN based on the current state of the first prior conversation;
applying the first input value to the DQN to identify a first response, from the plurality of available responses, to provide to the first user utterance; and
updating the DQN based on a first reward value provided based on the first response; and
repeating, by the computer system, the iterative training operation using a second conversation log corresponding to a second one of the plurality of prior conversations.
|