| CPC G06Q 30/0271 (2013.01) [G06N 20/00 (2019.01); G06Q 50/12 (2013.01)] | 20 Claims |

|
1. A method of fine-tuning a machine-learned language model, comprising:
obtaining a plurality of training instances for a plurality of users, each training instance including a user request to a service for a respective user, a sequence of activities of the user, one or more responses to the user request, and a score indicating a degree of satisfaction of the user of each of the one or more responses;
generating a user representation for each of the plurality of users by applying a transformer model to a sequence of tokens representing the sequence of activities of the user, wherein the transformer model is configured as an encoder-decoder architecture and is trained by applying one or more masked tokens to one or more training instances;
training an evaluation model coupled to receive a user representation and a response to a user request, and generate an estimated evaluation score, wherein parameters of the evaluation model are trained based on the responses to the user requests and evaluation scores for the responses in the training instances, wherein the evaluation model is configured as a logistical regression model;
finetuning a first machine-learned language model to generate a second machine-learned language model, wherein the second machine-learned language model is configured as a transformer architecture including an attention operation, the attention operation coupled to receive input data and generate queries, keys, and values, and generate an attention output from the queries, the keys, and the values, the finetuning comprising:
applying the user request and the user representation to the first machine-learned language model to generate a base response to the user request;
applying the user request and the user representation to the second machine-learned language model to generate an estimated response to the user request;
generating an estimated evaluation score by applying the trained evaluation model to the estimated response from the second machine-learned language model;
generating a loss function combining a first loss indicating a divergence between the base response and the estimated response and a second loss indicating the estimated evaluation score;
computing gradients of the parameters of the second machine-learned language model; and
updating parameters of the second machine-learned language model by combining the computed gradients of the parameters with current values of the parameters of the second machine-learned language model.
|