US 12,306,871 B2
Audio identification during performance
Dale T. Roberts, San Anselmo, CA (US); Bob Coover, Orinda, CA (US); Nicola Marcantonio, San Francisco, CA (US); and Markus K. Cremer, Orinda, CA (US)
Assigned to Gracenote, Inc., New York, NY (US)
Filed by Gracenote, Inc., Emeryville, CA (US)
Filed on Feb. 6, 2023, as Appl. No. 18/165,107.
Application 18/165,107 is a continuation of application No. 17/102,012, filed on Nov. 23, 2020, granted, now 11,574,008.
Application 17/102,012 is a continuation of application No. 15/888,998, filed on Feb. 5, 2018, granted, now 10,846,334, issued on Nov. 24, 2020.
Application 15/888,998 is a continuation of application No. 14/258,263, filed on Apr. 22, 2014, abandoned.
Prior Publication US 2023/0185847 A1, Jun. 15, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/683 (2019.01); G06F 18/231 (2023.01); G06Q 50/00 (2012.01); G10L 25/18 (2013.01); G10L 25/51 (2013.01)
CPC G06F 16/683 (2019.01) [G06F 18/231 (2023.01); G06Q 50/01 (2013.01); G10L 25/18 (2013.01); G10L 25/51 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory machine-readable storage medium comprising instructions that, when executed by one or more processors of a computer system, cause the computer system to perform a set of operations comprising:
receiving, from a computing device at a live performance of an audio piece, a fingerprint of a segment of a live version of the audio piece, wherein the fingerprint contains a query for identification of the audio piece during the live performance of the live version of the audio piece;
computing a similarity matrix between at least one reference fingerprint and the fingerprint, wherein computing the similarity matrix comprises:
generating a binary image of a log-frequency spectrogram representing the fingerprint, wherein a plurality of pixels of the binary image correspond to a time frame and frequency channel pair, and wherein at least one frequency channel represents a corresponding quarter tone frequency channel in a range from musical note C3 to musical note C8; and
generating a matrix product of the binary image and a plurality of reference fingerprints including the at least one reference fingerprint; and
identifying the audio piece, wherein identifying the audio piece is based on a match between the at least one reference fingerprint and the fingerprint, wherein the match is based on determining a threshold similarity between the at least one reference fingerprint and the fingerprint, and wherein determining the threshold similarity between the at least one reference fingerprint and the fingerprint is based on the similarity matrix.