US 12,217,750 B2
Using multiple modality input to feedback context for natural language understanding
Michael Bodell, Santa Clara, CA (US); John Bain, Federal Way, CA (US); Robert Chambers, Sammamish, WA (US); Karen M. Cross, Santa Barbara, CA (US); Michael Kim, Sunnyvale, CA (US); Nick Gedge, Redmond, WA (US); Daniel Frederick Penn, Sammamish, WA (US); Kunal Patel, Sammamish, WA (US); Edward Mark Tecot, Sunnyvale, CA (US); and Jeremy C. Waltmunson, Seattle, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 21, 2022, as Appl. No. 17/581,395.
Application 17/581,395 is a continuation of application No. 16/419,105, filed on May 22, 2019, granted, now 11,264,023.
Application 16/419,105 is a continuation of application No. 15/436,437, filed on Feb. 17, 2017, granted, now 10,332,514, issued on Jun. 25, 2019.
Application 15/436,437 is a continuation of application No. 13/219,891, filed on Aug. 29, 2011, granted, now 9,576,573, issued on Feb. 21, 2017.
Prior Publication US 2022/0148594 A1, May 12, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G10L 15/26 (2006.01)
CPC G10L 15/22 (2013.01) [G10L 2015/227 (2013.01); G10L 2015/228 (2013.01); G10L 15/26 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for recognizing speech, comprising:
receiving a spoken query for a first input field; receiving an input in a second input field of an interactive input form of a computer device;
identifying, by a machine learning model, a category for the spoken query based on a context of at least a value of the received input in the second input field, wherein the first input field and the second input field are distinct, the machine learning model is trained based on the input to the second input field and spoken words received while interactively outputting content, and the machine learning model predicts the context as the category for the spoken query based on the data input and content of a display;
construct a statistical dialog manager according to the identified category, wherein the statistical dialogue manager comprises a weight of a word in the identified category; and
converting the spoken query to generate a response text according to the statistical dialog manager according to the category, the statistical dialog manager including statistically weight terms associated with the category.