US 12,430,011 B2
Voice assistant-enabled client application with user view context and multi-modal input support
Tudor Buzasu Klein, Yokohama (JP); Viktoriya Taranov, Kirkland, WA (US); Sergiy Gavrylenko, Issaquah, WA (US); Jaclyn Carley Knapp, Redmond, WA (US); Andrew Paul McGovern, Redmond, WA (US); Harris Syed, Redmond, WA (US); Chad Steven Estes, Redmond, WA (US); Jesse Daniel Eskes Rusak, Redmond, WA (US); David Ernesto Heekin Burkett, Redmond, WA (US); Allison Anne O'Mahony, Redmond, WA (US); Ashok Kuppusamy, Redmond, WA (US); Jonathan Reed Harris, Redmond, WA (US); Jose Miguel Rady Allende, Redmond, WA (US); Diego Hernan Carlomagno, Redmond, WA (US); Talon Edward Ireland, Redmond, WA (US); Michael Francis Palermiti, II, Redmond, WA (US); Richard Leigh Mains, Redmond, WA (US); and Jayant Krishnamurthy, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 27, 2024, as Appl. No. 18/619,127.
Application 18/619,127 is a continuation of application No. 17/508,762, filed on Oct. 22, 2021, granted, now 11,972,095.
Application 17/508,762 is a continuation in part of application No. 17/364,362, filed on Jun. 30, 2021, granted, now 11,789,696, issued on Oct. 17, 2023.
Claims priority of provisional application 63/165,037, filed on Mar. 23, 2021.
Prior Publication US 2024/0241624 A1, Jul. 18, 2024
Int. Cl. G06F 3/0484 (2022.01); G06F 3/16 (2006.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01)
CPC G06F 3/0484 (2013.01) [G06F 3/167 (2013.01); G10L 15/08 (2013.01); G10L 15/22 (2013.01); G10L 2015/088 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A system comprising:
at least one computer processor; and
one or more computer storage media storing computer-useable instructions that, when used by the at least one computer processor, cause the at least one computer processor to perform operations comprising:
detecting a first user action of a first user;
subsequent to the detecting, capturing audio data comprising a voice utterance of the first user;
receiving, via a user interface, a manual user input of the first user;
receiving an indication that the manual user input of the first user was performed by the first user later in time than the voice utterance of the first user; and
based at least in part on the indication that the manual user input of the first user was performed by the first user later in time than the voice utterance of the first user, responding to only the manual user input and refraining from responding to the voice utterance.