US 12,148,437 B2
Feature domain bandwidth extension and spectral rebalance for ASR data augmentation
Dushyant Sharma, Mountain House, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Dec. 10, 2021, as Appl. No. 17/547,322.
Prior Publication US 2023/0186925 A1, Jun. 15, 2023
Int. Cl. G10L 19/02 (2013.01); G10L 15/06 (2013.01)
CPC G10L 19/02 (2013.01) [G10L 15/063 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of processing speech, comprising:
providing a first set of audio data having audio features in a first bandwidth;
down-sampling the first set of audio data to a second bandwidth lower than the first bandwidth;
producing, by a high frequency reconstruction network (HFRN), an estimate of audio features in the first bandwidth for the first set of audio data, based on at least the down-sampled audio data of the second bandwidth;
inputting, into the HFRN, a second set of audio data having audio features in the second bandwidth;
obtaining an equalization transfer function that maps a spectral tilt of the first set of audio data in the first bandwidth to a spectral tilt of the second set of audio data in the second bandwidth, wherein the equalization transfer function is multiplied by a random perturbation vector;
producing, by the HFRN, an estimate of audio features in the first bandwidth for the second set of audio data, based on the second set of audio data having audio features in the second bandwidth;
applying the equalization transfer function to the estimate of audio features in the first bandwidth for the second set of audio data; and
training a speech processing system (SPS) using i) the estimate of audio features in the first bandwidth for the first set of audio data, and ii) the estimate of audio features in the first bandwidth for the second set of audio data.