| CPC G06F 16/345 (2019.01) [G06F 16/3329 (2019.01); G10L 15/1815 (2013.01); G10L 15/183 (2013.01)] | 16 Claims |

|
1. A computer-implemented method, comprising:
receiving, from a device, first audio data representing a first spoken input;
determining that the first spoken input corresponds to a request for information related to a first entity;
receiving, from a storage medium, first text data of a document corresponding to the first spoken input, the first text data representing a first plurality of words, the first text data having been stored in the storage medium prior to receipt of the first audio data;
determining first context data corresponding to the first spoken input, the first context data representing at least the first entity;
determining a first representation of the first context data;
processing the first text data using a first encoder of a machine learning (ML) model to generate first encoded data;
processing the first representation using a second encoder of the ML model to generate second encoded data;
processing first encoded data and the second encoded data using a decoder of the ML model to determine second text data representing a first summary of information described in the document, the second text data corresponding to a second plurality of words including at least one word selected from the first plurality of words, wherein the ML model is configured to cause the decoder to weight one or more elements of the first representation when selecting one or more words from the first plurality of words to include in the second plurality of words so as to cause the first summary to include information corresponding to the first entity;
determining first output data using the second text data; and
sending, to the device, the first output data in response to the first spoken input.
|