US 11,657,807 B2
Multi-tier speech processing and content operations
Ponnu Jacob, Seattle, WA (US); Jingqian Zhao, Bellevue, WA (US); Prathap Ramachandra, Kirkland, WA (US); Uday Kumar Kollu, Seattle, WA (US); Lior Maor Maimon, Redmond, WA (US); and Sean Gunnar Skaar, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 24, 2021, as Appl. No. 17/357,338.
Prior Publication US 2022/0415312 A1, Dec. 29, 2022
Int. Cl. G10L 15/22 (2006.01); G10L 15/18 (2013.01); G10L 15/183 (2013.01); G10L 15/32 (2013.01)
CPC G10L 15/1815 (2013.01) [G10L 15/183 (2013.01); G10L 15/22 (2013.01); G10L 15/32 (2013.01); G10L 2015/223 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory and configured by the executable instructions to at least:
receive audio data from a voice-enabled device, wherein the audio data represents an utterance associated with a user account;
generate intent data comprising a plurality of semantic representations of the utterance;
identify a first domain of an intent processing application hierarchy based at least partly on an association of the first domain with a semantic representation of the plurality of semantic representations,
wherein the intent processing application hierarchy comprises a plurality of domains,
wherein the first domain comprises a plurality of applications configured to perform content item operations in response to utterances associated with the semantic representation, and
wherein a content item operation comprises at least one of: recommendation of a content item, acquisition of the content item, search for the content item, or presentation of the content item;
analyze a plurality of contextual signals associated with the utterance, wherein the plurality of contextual signals comprises a first contextual signal representing one or more content item attributes of a content library associated with the user account, and wherein a second contextual signal of the plurality of contextual signals represents at least one of a timing term, a popularity term, or a recommendation term of the utterance;
identify, based on results of analyzing the plurality of contextual signals, a first application of the plurality of applications; and
assign first application to generate a response.