US 11,657,807 B2
Multi-tier speech processing and content operations
1. A system comprising:
computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory and configured by the executable instructions to at least:
receive audio data from a voice-enabled device, wherein the audio data represents an utterance associated with a user account;
generate intent data comprising a plurality of semantic representations of the utterance;
identify a first domain of an intent processing application hierarchy based at least partly on an association of the first domain with a semantic representation of the plurality of semantic representations,
wherein the intent processing application hierarchy comprises a plurality of domains,
wherein the first domain comprises a plurality of applications configured to perform content item operations in response to utterances associated with the semantic representation, and
wherein a content item operation comprises at least one of: recommendation of a content item, acquisition of the content item, search for the content item, or presentation of the content item;
analyze a plurality of contextual signals associated with the utterance, wherein the plurality of contextual signals comprises a first contextual signal representing one or more content item attributes of a content library associated with the user account, and wherein a second contextual signal of the plurality of contextual signals represents at least one of a timing term, a popularity term, or a recommendation term of the utterance;
identify, based on results of analyzing the plurality of contextual signals, a first application of the plurality of applications; and
assign first application to generate a response.