US 12,444,431 B1
	Microphone reference echo cancellation
Carlos Renato Nakagawa, San Jose, CA (US); Ludger Solbach, San Jose, CA (US); and Robert Ayrapetian, Morgan Hill, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 29, 2021, as Appl. No. 17/488,471.
Int. Cl. G10L 21/0364 (2013.01); G10K 11/178 (2006.01); G10L 21/0208 (2013.01); H04L 27/26 (2006.01); H04R 1/34 (2006.01); G10L 21/0216 (2013.01)

CPC G10L 21/0364 (2013.01) [G10K 11/17823 (2018.01); H04L 27/2651 (2021.01); H04L 27/26524 (2021.01); H04R 1/342 (2013.01); G10L 2021/02165 (2013.01); G10L 2021/02166 (2013.01)]

23 Claims

1. A computer-implemented method, the method comprising:

receiving first audio data associated with a first microphone;

receiving second audio data associated with a second microphone, the second audio data representing a combination of speech and noise;

generating, by a first adaptive filter using the first audio data as a first target signal and the second audio data as a first reference signal, a first portion of third audio data, wherein generating the first portion of the third audio data further comprises:

generating first reference audio data by applying first filter coefficient values to the second audio data, the first reference audio data representing the noise, and

generating the first portion of the third audio data by subtracting the first reference audio data from a first portion of the first audio data;

generating, by a second adaptive filter using the second audio data as a second target signal and the second audio data as a second reference signal, a first portion of fourth audio data, wherein generating the first portion of the fourth audio data further comprises:

generating second reference audio data by applying second filter coefficient values to the second audio data, the second reference audio data representing the noise, and

generating the first portion of the fourth audio data by subtracting the second reference audio data from a first portion of the second audio data;

generating, by a beamformer component, a first portion of directional audio data using the first portion of the third audio data and the first portion of the fourth audio data, the directional audio data comprising:

first beamformed audio data corresponding to a first direction relative to a device, and

second beamformed audio data corresponding to a second direction relative to the device, the second direction different from the first direction; and

causing an action to be performed using the directional audio data.