US 12,254,886 B2
Collaborative ranking of interpretations of spoken utterances
Akshay Goel, Seattle, WA (US); Nitin Khandelwal, Sunnyvale, CA (US); Richard Park, Palo Alto, CA (US); Brian Chatham, Pleasanton, CA (US); Jonathan Eccles, San Francisco, CA (US); David Sanchez, Burlingame, CA (US); and Dmytro Lapchuk, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Feb. 28, 2024, as Appl. No. 18/590,549.
Application 18/590,549 is a continuation of application No. 17/537,104, filed on Nov. 29, 2021, granted, now 11,948,580.
Claims priority of provisional application 63/238,592, filed on Aug. 30, 2021.
Prior Publication US 2024/0203423 A1, Jun. 20, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/32 (2013.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/32 (2013.01) [G10L 15/18 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A system comprising:
at least one processor; and
memory storing instructions that, when executed, cause the at least one processor to be operable to:
receive audio data that captures a spoken utterance of a user, the audio data being generated by one or more microphones of a client device of the user, and the spoken utterance being directed to an automated assistant executed at least in part at the client device;
determine, based on processing the audio data, a plurality of first-party interpretations of the spoken utterance, each of the plurality of first-party interpretations being associated with a corresponding first-party predicted value indicative of a magnitude of confidence that each of the first-party interpretations are predicted to satisfy the spoken utterance;
identify a given third-party agent capable of satisfying the spoken utterance;
transmit, to the given third-party agent and over one or more networks, and based on processing the audio data, one or more structured requests that, when received, causes the given third-party to determine a plurality of third-party interpretations of the spoken utterance, each of the plurality of third-party interpretations being associated with a corresponding third-party predicted value indicative of a magnitude of confidence that each of the third-party interpretations are predicted to satisfy the spoken utterance;
receive, from the given third-party agent and over one or more of the networks, the plurality of third-party interpretations of the spoken utterance;
select, based on the corresponding first-party predicted values and the corresponding third-party predicted values, a given interpretation of the spoken utterance from among the plurality of first-party interpretations and the plurality third-party interpretations;
cause the given third-party agent to satisfy the spoken utterance based on the given interpretation of the spoken utterance
determine whether the given interpretation is one of the plurality of first-party interpretations or one of the plurality of third-party interpretations; and
in response to determining that the given interpretation is one of the plurality of first-party interpretations:
cause the automated assistant to provide, for presentation to the user of the client device, an indication that the given interpretation is one of the plurality of first-party interpretations; and
in response to determining that the given interpretation is one of the plurality of third-party interpretations:
cause the automated assistant to provide, for presentation to the user of the client device, an indication that the given interpretation is one of the plurality of third-party interpretations.