US 12,260,874 B2
Speech denoising networks using speech and noise modeling
Chou-Chang Yang, San Jose, CA (US); Ching-Hua Lee, Mountain View, CA (US); Rakshith Sharma Srinivasa, Sunnyvale, CA (US); Yashas Malur Saidutta, San Jose, CA (US); Yilin Shen, San Jose, CA (US); and Hongxia Jin, San Jose, CA (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Nov. 22, 2022, as Appl. No. 18/058,104.
Claims priority of provisional application 63/395,741, filed on Aug. 5, 2022.
Prior Publication US 2024/0046946 A1, Feb. 8, 2024
Int. Cl. G10L 21/0232 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 21/0216 (2013.01); G10L 25/18 (2013.01)
CPC G10L 21/0232 (2013.01) [G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 25/18 (2013.01); G10L 2021/02166 (2013.01)] 20 Claims
OG exemplary drawing
 
9. An apparatus comprising:
at least one processing device configured to:
obtain noisy speech signals;
extract acoustic features from the noisy speech signals;
receive a predicted speech mask from a speech mask prediction model based on a first subset of the acoustic features;
receive a predicted noise mask from a noise mask prediction model based on a second subset of the acoustic features;
provide predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model; and
generate a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.