| CPC G10L 15/063 (2013.01) [G10L 15/22 (2013.01); G10L 15/30 (2013.01); H04L 67/133 (2022.05); H04L 67/53 (2022.05); H04M 3/5166 (2013.01); G10L 2015/0635 (2013.01); G10L 2015/228 (2013.01)] | 13 Claims |

|
1. A method implemented by one or more processors, the method comprising:
obtaining, via a voice bot development platform, a plurality of remote procedure call (RPC) outbound training instances, each of the plurality of RPC outbound training instances including:
training instance input, the training instance input including at least a portion of a corresponding conversation and a prior context of the corresponding conversation, and
training instance output, the training instance output including a corresponding ground truth response to at least the portion of the corresponding conversation, wherein the corresponding ground truth response for the training instance output of a given RPC outbound training instance, of the plurality of RPC outbound training instances, comprises at least a corresponding RPC outbound request;
training, via the voice bot development platform, a voice bot based on at least the plurality of RPC outbound training instances,
wherein training the voice bot based on the plurality of RPC outbound training instances causes the voice bot to interact with a third-party system,
wherein training the voice bot based on the given RPC outbound training instance comprises utilizing one or more attention mechanisms to attention the voice bot to generate the corresponding RPC outbound request, to be transmitted to the third-party system, and based on processing at least the portion of the corresponding conversation and the prior context of the corresponding conversation, and
wherein processing at least the portion of the corresponding conversation and the prior context of the corresponding conversation to train the voice bot comprises:
processing, using a plurality of machine learning (ML) layers of a ML model, and for the given RPC outbound training instance, at least the portion of the corresponding conversation and the prior context of the corresponding conversation to generate an embedding associated with a current state of the corresponding conversation;
processing, using a plurality of additional ML layers of the ML model, at least the embedding associated with the current state of the corresponding conversation to generate a predicted embedding associated with a predicted response to at least the portion of the corresponding conversation;
comparing, in embedding space, the predicted embedding associated with the predicted response to at least the portion of the corresponding conversation and a corresponding ground truth embedding associated with the corresponding RPC outbound request;
generating, based on comparing the predicted embedding and the corresponding ground truth embedding, one or more losses; and
updating the ML model based on one or more of the losses; and
subsequent to training the voice bot:
causing the trained voice bot to be deployed for conducting conversations on behalf of a third-party.
|