CPC G10L 21/0208 (2013.01) [G06N 20/00 (2019.01); G10L 21/0364 (2013.01); G10L 25/27 (2013.01); G10L 25/84 (2013.01)] | 20 Claims |
1. A system comprising:
a memory; and
a processing device communicably coupled to the memory, the processing device to:
receive, from a plurality of input devices, audio data, wherein the audio data of each input device corresponds to a plurality of frequency ranges;
determine, for each of the plurality of frequency ranges for each input device of the plurality of input devices, a speech energy level by providing audio data corresponding to each frequency range as input to a model that is trained to determine a speech energy level of given audio data in the corresponding frequency range of the plurality of frequency ranges;
for each input device, determine a statistical value associated with the speech energy level of the input device;
identify a strongest input device, wherein the strongest input device has highest statistical value associated with the speech energy level;
compare the statistical value associated with the speech energy level of each input device other than the strongest input device with the statistical value associated with the speech energy level of the strongest input device; and
depending on the comparing, determine whether to update, for a respective input device, a gain value to an estimated target gain value based on the statistical value associated with the speech energy level of the respective input device.
|