| CPC G10L 19/02 (2013.01) [G10L 15/063 (2013.01)] | 20 Claims |

|
1. A method of processing speech, comprising:
providing a first set of audio data having audio features in a first bandwidth;
down-sampling the first set of audio data to a second bandwidth lower than the first bandwidth;
producing, by a high frequency reconstruction network (HFRN), an estimate of audio features in the first bandwidth for the first set of audio data, based on at least the down-sampled audio data of the second bandwidth;
inputting, into the HFRN, a second set of audio data having audio features in the second bandwidth;
obtaining an equalization transfer function that maps a spectral tilt of the first set of audio data in the first bandwidth to a spectral tilt of the second set of audio data in the second bandwidth, wherein the equalization transfer function is multiplied by a random perturbation vector;
producing, by the HFRN, an estimate of audio features in the first bandwidth for the second set of audio data, based on the second set of audio data having audio features in the second bandwidth;
applying the equalization transfer function to the estimate of audio features in the first bandwidth for the second set of audio data; and
training a speech processing system (SPS) using i) the estimate of audio features in the first bandwidth for the first set of audio data, and ii) the estimate of audio features in the first bandwidth for the second set of audio data.
|