US 12,235,896 B2
Methods and apparatus to fingerprint an audio signal via exponential normalization
Alexander Berrian, Emeryville, CA (US); Matthew James Wilkinson, Emeryville, CA (US); and Robert Coover, Orinda, CA (US)
Assigned to Gracenote, Inc., New York, NY (US)
Filed by Gracenote, Inc., Emeryville, CA (US)
Filed on May 24, 2024, as Appl. No. 18/674,678.
Application 18/674,678 is a continuation of application No. 16/696,874, filed on Nov. 26, 2019, granted, now 12,032,628.
Prior Publication US 2024/0386051 A1, Nov. 21, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 25/51 (2013.01); G06F 16/683 (2019.01); G10L 25/21 (2013.01)
CPC G06F 16/683 (2019.01) [G10L 25/21 (2013.01); G10L 25/51 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A tangible, non-transitory computer readable medium comprising instructions, which when executed, cause one or more processors to perform a set of operations comprising:
transforming an audio signal into a frequency domain including a plurality of time-frequency bins, wherein each time-frequency bin of the plurality of time-frequency bins corresponds to an intersection of a frequency bin and a time bin and contains a portion of the audio signal;
determining a first audio segment comprising a first group of time-frequency bins, wherein the first group of time-frequency bins comprises a first time-frequency bin;
determining a second audio segment comprising a second group of time-frequency bins, wherein the second group of time-frequency bins comprises a second time-frequency bin;
determining an exponential mean value associated with the second time-frequency bin based on a magnitude of the audio signal associated with the second time-frequency bin;
normalizing the first time-frequency bin based on the exponential mean value;
generating a fingerprint of the audio signal based on the normalized first time-frequency bin;
generating a subfingerprint by selecting energy extrema associated with the normalized first time-frequency bin, wherein the fingerprint comprises the subfingerprint, and wherein selecting the energy extrema comprises selecting one or more normalized time-frequency bins with highest normalized energy values; and
based on the normalized first time-frequency bin, discarding the first group of time-frequency bins.