US 12,266,346 B2
Noisy far-field speech recognition
Dading Chong, Hangzhou (CN); Zhaoyi Liu, Hangzhou (CN); Vijay Parthasarathy, San Jose, CA (US); and Xiao Song, Hangzhou (CN)
Assigned to Zoom Communications, Inc., San Jose, CA (US)
Filed by Zoom Communications, Inc., San Jose, CA (US)
Filed on Jul. 30, 2021, as Appl. No. 17/390,788.
Prior Publication US 2023/0033768 A1, Feb. 2, 2023
Int. Cl. G10L 15/00 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 25/84 (2013.01)
CPC G10L 15/063 (2013.01) [G10L 15/02 (2013.01); G10L 25/84 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
training a model using audio recordings from noise scenarios in a set of training data;
decomposing a training signal from the set of training data into a message component and a noise component;
scaling the noise component by a random scale factor to obtain a scaled noise, wherein the random scale factor is a power with a base that is a constant and an exponent that includes a random variable;
adding the scaled noise to the message component to obtain a perturbed audio signal that is included in the set of training data;
training a first teacher model using a first subset of the set of training data associated with a first noise scenario of the noise scenarios;
training a second teacher model using a second subset of the set of training data associated with a second noise scenario of the noise scenarios; and
training a student model using soft labels output from the first teacher model and soft labels output from the second teacher model.