US 12,254,876 B2
	Recognizing speech in the presence of additional audio
Diego Melendo Casado, Mountain View, CA (US); Ignacio Lopez Moreno, Brooklyn, NY (US); and Javier Gonzalez-Dominguez, Madrid (ES)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 19, 2024, as Appl. No. 18/609,542.
Application 18/609,542 is a continuation of application No. 17/303,139, filed on May 21, 2021, granted, now 11,942,083.
Application 17/303,139 is a continuation of application No. 16/548,947, filed on Aug. 23, 2019, granted, now 11,031,002, issued on Jun. 8, 2021.
Application 16/548,947 is a continuation of application No. 15/887,034, filed on Feb. 2, 2018, granted, now 10,431,213, issued on Oct. 1, 2019.
Application 15/887,034 is a continuation of application No. 15/460,342, filed on Mar. 16, 2017, granted, now 9,922,645, issued on Mar. 20, 2018.
Application 15/460,342 is a continuation of application No. 15/093,309, filed on Apr. 7, 2016, granted, now 9,601,116, issued on Mar. 21, 2017.
Application 15/093,309 is a continuation of application No. 14/181,345, filed on Feb. 14, 2014, granted, now 9,318,112, issued on Apr. 19, 2016.
Prior Publication US 2024/0221737 A1, Jul. 4, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/00 (2013.01); G06F 3/16 (2006.01); G10L 15/20 (2006.01); G10L 15/22 (2006.01); G10L 17/06 (2013.01); G10L 21/034 (2013.01); G10L 25/84 (2013.01); H03G 3/30 (2006.01); G10L 15/26 (2006.01); G10L 17/00 (2013.01)

CPC G10L 15/20 (2013.01) [G06F 3/165 (2013.01); G06F 3/167 (2013.01); G10L 15/222 (2013.01); G10L 17/06 (2013.01); G10L 21/034 (2013.01); G10L 25/84 (2013.01); H03G 3/3005 (2013.01); G10L 15/26 (2013.01); G10L 17/00 (2013.01)]

20 Claims

1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:

receiving a first query spoken by a user and captured by a microphone of a computing device associated with the user;

providing, for audible playback from the computing device, a text-to-speech (TTS) output generated by a TTS system associated with the computing device, the TTS output comprising synthesized audio that conveys a response to the first query;

while the computing device is audibly playing back the TTS output:

detecting a barge-in event from the user to provide a second query;

in response to detecting the barge-in event, initiating a reduction in an audio output level of the computing device; and

receiving an audio signal captured by the microphone that conveys the second query spoken by the user; and

providing the audio signal characterizing the second query to a speech recognition engine.