CPC G06F 16/683 (2019.01) [G10L 25/21 (2013.01); G10L 25/51 (2013.01)] | 20 Claims |
1. A tangible, non-transitory computer readable medium comprising instructions, which when executed, cause one or more processors to perform a set of operations comprising:
transforming an audio signal into a frequency domain including a plurality of time-frequency bins, wherein each time-frequency bin of the plurality of time-frequency bins corresponds to an intersection of a frequency bin and a time bin and contains a portion of the audio signal;
determining a first audio segment comprising a first group of time-frequency bins, wherein the first group of time-frequency bins comprises a first time-frequency bin;
determining a second audio segment comprising a second group of time-frequency bins, wherein the second group of time-frequency bins comprises a second time-frequency bin;
determining an exponential mean value associated with the second time-frequency bin based on a magnitude of the audio signal associated with the second time-frequency bin;
normalizing the first time-frequency bin based on the exponential mean value;
generating a fingerprint of the audio signal based on the normalized first time-frequency bin;
generating a subfingerprint by selecting energy extrema associated with the normalized first time-frequency bin, wherein the fingerprint comprises the subfingerprint, and wherein selecting the energy extrema comprises selecting one or more normalized time-frequency bins with highest normalized energy values; and
based on the normalized first time-frequency bin, discarding the first group of time-frequency bins.
|