US 12,073,845 B2
	Automatic gain control based on machine learning level estimation of the desired signal
Karl Allan Tore Rudberg, Älvsjö (SE); and Alessio Bazzica, Järfälla (SE)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 13, 2023, as Appl. No. 18/120,911.
Application 18/120,911 is a continuation of application No. 16/820,578, filed on Mar. 16, 2020, granted, now 11,605,392.
Prior Publication US 2023/0215451 A1, Jul. 6, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/0208 (2013.01); G06N 20/00 (2019.01); G10L 21/0364 (2013.01); G10L 25/27 (2013.01); G10L 25/84 (2013.01)

CPC G10L 21/0208 (2013.01) [G06N 20/00 (2019.01); G10L 21/0364 (2013.01); G10L 25/27 (2013.01); G10L 25/84 (2013.01)]

20 Claims

1. A system comprising:

a memory; and

a processing device communicably coupled to the memory, the processing device to:

receive, from a plurality of input devices, audio data, wherein the audio data of each input device corresponds to a plurality of frequency ranges;

determine, for each of the plurality of frequency ranges for each input device of the plurality of input devices, a speech energy level by providing audio data corresponding to each frequency range as input to a model that is trained to determine a speech energy level of given audio data in the corresponding frequency range of the plurality of frequency ranges;

for each input device, determine a statistical value associated with the speech energy level of the input device;

identify a strongest input device, wherein the strongest input device has highest statistical value associated with the speech energy level;

compare the statistical value associated with the speech energy level of each input device other than the strongest input device with the statistical value associated with the speech energy level of the strongest input device; and

depending on the comparing, determine whether to update, for a respective input device, a gain value to an estimated target gain value based on the statistical value associated with the speech energy level of the respective input device.