US 11,942,083 B2
	Recognizing speech in the presence of additional audio
Diego Melendo Casado, Mountain View, CA (US); Ignacio Lopez Moreno, New York, NY (US); and Javier Gonzalez-Dominguez, Madrid (ES)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on May 21, 2021, as Appl. No. 17/303,139.
Application 17/303,139 is a continuation of application No. 16/548,947, filed on Aug. 23, 2019, granted, now 11,031,002.
Application 16/548,947 is a continuation of application No. 15/887,034, filed on Feb. 2, 2018, granted, now 10,431,213, issued on Oct. 1, 2019.
Application 15/887,034 is a continuation of application No. 15/460,342, filed on Mar. 16, 2017, granted, now 9,922,645, issued on Mar. 20, 2018.
Application 15/460,342 is a continuation of application No. 15/093,309, filed on Apr. 7, 2016, granted, now 9,601,116, issued on Mar. 21, 2017.
Application 15/093,309 is a continuation of application No. 14/181,345, filed on Feb. 14, 2014, granted, now 9,318,112, issued on Apr. 19, 2016.
Prior Publication US 2021/0272562 A1, Sep. 2, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/00 (2013.01); G06F 3/16 (2006.01); G10L 15/20 (2006.01); G10L 15/22 (2006.01); G10L 17/06 (2013.01); G10L 21/034 (2013.01); G10L 25/84 (2013.01); H03G 3/30 (2006.01); G10L 15/26 (2006.01); G10L 17/00 (2013.01)

CPC G10L 15/20 (2013.01) [G06F 3/165 (2013.01); G06F 3/167 (2013.01); G10L 15/222 (2013.01); G10L 17/06 (2013.01); G10L 21/034 (2013.01); G10L 25/84 (2013.01); H03G 3/3005 (2013.01); G10L 15/26 (2013.01); G10L 17/00 (2013.01)]

20 Claims

1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:

while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query;

processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and

in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone.