US 12,444,429 B2
Speech identification and extraction from noise using extended high frequency information
Brian B Monson, Champaign, IL (US); and Rohit M Ananthanarayana, Champaign, IL (US)
Assigned to The Board of Regents of the University of Illinois, Urbana, IL (US)
Filed by THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ILLINOIS, Urbana, IL (US)
Filed on Dec. 21, 2022, as Appl. No. 18/085,705.
Claims priority of provisional application 63/292,307, filed on Dec. 21, 2021.
Prior Publication US 2023/0197099 A1, Jun. 22, 2023
Int. Cl. G10L 21/0232 (2013.01); G10L 21/0224 (2013.01); G10L 21/0272 (2013.01); G10L 21/0308 (2013.01); G10L 25/09 (2013.01); G10L 25/18 (2013.01); G10L 25/21 (2013.01); G10L 25/84 (2013.01)
CPC G10L 21/0232 (2013.01) [G10L 21/0224 (2013.01); G10L 21/0272 (2013.01); G10L 21/0308 (2013.01); G10L 25/09 (2013.01); G10L 25/18 (2013.01); G10L 25/21 (2013.01); G10L 25/84 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium comprising program instructions executable by at least one processor to cause the at least one processor to perform a method comprising:
obtaining a first audio sample;
determining that a first portion of the first audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds a threshold energy level;
responsive to determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, determining a first audio filter based on the first portion of the first audio sample by:
determining a first spectrogram for the first portion; and
performing non-negative matrix factorization to generate a first matrix and a second matrix whose product corresponds to a low-frequency portion of the first spectrogram that is below a threshold frequency, wherein the first matrix is composed of a set of column vectors that span along a frequency dimension of the first spectrogram, and wherein the second matrix is composed of a set of row vectors that span along a time dimension of the first spectrogram;
subsequent to obtaining the first audio sample, obtaining a second audio sample; and
applying the first audio filter to the second audio sample to generate a first audio output by:
determining a second spectrogram for the second audio sample;
applying the first matrix to a low-frequency portion of the second spectrogram that is below the threshold frequency to generate a third spectrogram that represents noise content of the second audio sample; and
using the third spectrogram to remove the noise content from the second audio sample, thereby generating the first audio output.