US 12,456,476 B2
	Noise suppression for speech data with reduced power consumption
Chandan Karadagur Ananda Reddy, Cupertino, CA (US); and Navin Chatlani, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 14, 2022, as Appl. No. 18/081,492.
Prior Publication US 2024/0203438 A1, Jun. 20, 2024
Int. Cl. G10L 21/0232 (2013.01); G10L 21/0224 (2013.01); G10L 25/60 (2013.01); G10L 21/0216 (2013.01)

CPC G10L 21/0232 (2013.01) [G10L 21/0224 (2013.01); G10L 25/60 (2013.01); G10L 2021/02163 (2013.01)]

17 Claims

1. A computer-implemented method comprising:

receiving, by one or more processors, a current time frame of speech data in a time domain;

transforming the current time frame to a current frequency frame of the speech data in a frequency domain;

determining, using a noise classifier implemented by the one or more processors, a classification of noise content in the current frequency frame based at least on noise content determined in the current frequency frame, wherein the classification is selected from at least two classifications detectable by the noise classifier, wherein the two classifications include a first classification associated with creating and applying a current noise suppression mask to the current frequency frame, and a second classification associated with applying a previously-created noise suppression mask to the current frequency frame without creating the current noise suppression mask;

in response to determining the first classification:

creating, by the one or more processors, the current noise suppression mask for the current frequency frame based on the noise content in the current frequency frame, wherein creating the current noise suppression mask includes determining one or more gain functions associated with one or more frequency bands of the current frequency frame; and

applying, by the one or more processors, the current noise suppression mask to the current frequency frame to suppress the noise content in the current frequency frame and obtain a current noise-suppressed frequency frame of the speech data;

in response to determining the second classification:

selecting, by the one or more processors, the previously-created noise suppression mask, wherein the previously-created noise suppression mask was created and applied based on a prior frequency frame of the speech data; and

applying, by the one or more processors, the previously-created noise suppression mask to the current frequency frame to suppress the noise content in the current frequency frame and obtain the current noise-suppressed frequency frame of the speech data, without creating the current noise suppression mask;

transforming the current noise-suppressed frequency frame of the speech data to a current noise-suppressed time frame of the speech data that is in the time domain; and

outputting the current noise-suppressed time frame of the speech data that provides audio output having reduced noise.