US 11,756,541 B1
Contextual resolver for voice requests
Driss Alaoui Mrani, Levallois-Perrret (FR); Ashlesha Vishnu Kadam, Seattle, WA (US); and Manigantan Sethuraman, Bothell, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Aug. 24, 2020, as Appl. No. 17/777.
Int. Cl. G10L 15/22 (2006.01); G06F 16/48 (2019.01); G06F 16/2457 (2019.01); G06F 16/438 (2019.01); G06N 5/04 (2023.01); G06F 3/16 (2006.01); G10L 15/18 (2013.01); G06N 20/00 (2019.01)
CPC G10L 15/22 (2013.01) [G06F 3/165 (2013.01); G06F 16/24578 (2019.01); G06F 16/438 (2019.01); G06F 16/48 (2019.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G10L 15/1815 (2013.01); G10L 2015/223 (2013.01); G10L 2015/228 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a memory configured to store computer-executable instructions; and
a processor configured to access the memory and execute the computer-executable instructions to at least:
receive, via a voice interface of a voice-controlled device, first voice data associated with a first request to play first media content;
cause the first media content to play, via an output component, in response to the first request;
receive, via the voice interface of the voice-controlled device and while the first media content is playing, second voice data associated with a second request to play media content;
receive contextual data describing one or more contexts;
execute a ranking algorithm with the second request and the contextual data as inputs, the ranking algorithm configured to output a ranked list of one or more media content files related to the second request by:
identifying metadata slot values from the second request, the metadata slot values corresponding to the one or more contexts;
matching the metadata slot values with first metadata associated with the first media content to produce matched contexts;
matching the first metadata associated with the first media content with additional metadata associated with additional media content files to produce common metadata values;
inputting the matched contexts and the common metadata values into the ranking algorithm;
generating confidence scores corresponding to each of the matched contexts and the common metadata values, the confidence scores characterizing a ranking of the matched contexts and the common metadata values; and
generating, using the confidence scores and based at least in part on the ranking, the ranked list of one or more media content files;
select a second media content file from the ranked list; and
play the second media content file, via the output component, in response to the second request.