US 12,293,772 B2
Method for identifying an audio signal
Pradyumna Thiruvenkatanathan, London (GB); Guy Spyropoulos, London (GB); and Anindya Moitra, London (GB)
Assigned to EARZZ LIMITED, Devon (GB)
Filed by EARZZ LIMITED, Devon (GB)
Filed on Aug. 25, 2022, as Appl. No. 17/895,292.
Claims priority of application No. 2112306 (GB), filed on Aug. 27, 2021.
Prior Publication US 2023/0060936 A1, Mar. 2, 2023
Int. Cl. G10L 25/21 (2013.01); G10L 25/18 (2013.01); G10L 25/24 (2013.01); G10L 25/78 (2013.01); H04R 3/00 (2006.01)
CPC G10L 25/21 (2013.01) [G10L 25/18 (2013.01); G10L 25/24 (2013.01); G10L 25/78 (2013.01); H04R 3/00 (2013.01); H04R 2420/07 (2013.01); H04R 2430/03 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A computer-implemented method for identifying at least one audio signal, the method comprising the steps of:
receiving audio data at a receiver module from at least one audio sensor; and
processing the audio data using a signal recognition module;
wherein processing the audio data using the signal recognition module comprises:
based on the received audio data, determining at least one of:
one or more time-varying vector arrays of octave band energies, and
one or more time-varying vector arrays of fractional octave band energies;
determining one or more time-varying vector arrays of Mel-Frequency Cepstral Coefficients (MFCC) values based on the received audio data;
generating audio feature image data based on the one or more time-varying vector arrays of MFCC values, and at least one of:
the one or more time-varying vector arrays of octave band energies, and
the one or more time-varying vector arrays of fractional octave band energies;
wherein the audio feature image data is generated by combining vector values of the one or more time-varying vector arrays of MFCC values and at least one of:
vector values of the one or more time-varying vector arrays of octave band energies, and
vector values of the one or more time-varying vector arrays of fractional octave band energies
into a single matrix; and
identifying at least one audio signal using a first model based on the audio feature image data;
wherein the first model comprises an image recognition model to identify a pattern in the audio feature image data to identify the at least one audio signal.