US 12,293,772 B2
	Method for identifying an audio signal
Pradyumna Thiruvenkatanathan, London (GB); Guy Spyropoulos, London (GB); and Anindya Moitra, London (GB)
Assigned to EARZZ LIMITED, Devon (GB)
Filed by EARZZ LIMITED, Devon (GB)
Filed on Aug. 25, 2022, as Appl. No. 17/895,292.
Claims priority of application No. 2112306 (GB), filed on Aug. 27, 2021.
Prior Publication US 2023/0060936 A1, Mar. 2, 2023
Int. Cl. G10L 25/21 (2013.01); G10L 25/18 (2013.01); G10L 25/24 (2013.01); G10L 25/78 (2013.01); H04R 3/00 (2006.01)

CPC G10L 25/21 (2013.01) [G10L 25/18 (2013.01); G10L 25/24 (2013.01); G10L 25/78 (2013.01); H04R 3/00 (2013.01); H04R 2420/07 (2013.01); H04R 2430/03 (2013.01)]

19 Claims

1. A computer-implemented method for identifying at least one audio signal, the method comprising the steps of:

receiving audio data at a receiver module from at least one audio sensor; and

processing the audio data using a signal recognition module;

wherein processing the audio data using the signal recognition module comprises:

based on the received audio data, determining at least one of:

one or more time-varying vector arrays of octave band energies, and

one or more time-varying vector arrays of fractional octave band energies;

determining one or more time-varying vector arrays of Mel-Frequency Cepstral Coefficients (MFCC) values based on the received audio data;

generating audio feature image data based on the one or more time-varying vector arrays of MFCC values, and at least one of:

the one or more time-varying vector arrays of octave band energies, and

the one or more time-varying vector arrays of fractional octave band energies;

wherein the audio feature image data is generated by combining vector values of the one or more time-varying vector arrays of MFCC values and at least one of:

vector values of the one or more time-varying vector arrays of octave band energies, and

vector values of the one or more time-varying vector arrays of fractional octave band energies

into a single matrix; and

identifying at least one audio signal using a first model based on the audio feature image data;

wherein the first model comprises an image recognition model to identify a pattern in the audio feature image data to identify the at least one audio signal.