US 11,749,281 B2
Neural speech-to-meaning
Sudharsan Krishnaswamy, San Jose, CA (US); Maisy Wieman, Boulder, CO (US); and Jonah Probell, Alviso, CA (US)
Assigned to SoundHound AI IP, LLC, Santa Clara, CA (US)
Filed by SoundHound, Inc., Santa Clara, CA (US)
Filed on Dec. 4, 2019, as Appl. No. 16/703,783.
Prior Publication US 2021/0174806 A1, Jun. 10, 2021
Int. Cl. G10L 15/26 (2006.01); G06F 3/16 (2006.01); G10L 15/18 (2013.01); G10L 15/183 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/26 (2013.01) [G06F 3/167 (2013.01); G10L 15/183 (2013.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A machine for recognizing an intent in speech audio, the machine comprising:
a variable recognizer that processes speech audio features, computes a probability of the speech audio having any of a plurality of enumerated variable values, and outputs the value of the plurality of enumerated variable values with the highest probability; and
an intent recognizer that processes speech audio features, computes a probability of the speech audio having the intent, and in response to the probability being above an intent threshold, produces a request for a virtual assistant action,
wherein the machine has no lexical representation of the speech audio;
the variable recognizer indicates the probability of the speech audio having a value from the plurality of enumerated variable values;
the intent recognizer conditions its output of a request for an action on the probability of the speech audio having a value from the plurality of enumerated variable values; and
the conditioning is based on a delayed indication of the probability of the speech audio having a value from the plurality of enumerated variable values.