| CPC G10L 25/51 (2013.01) | 16 Claims |

|
1. A system for automatically selecting a sound recognition model for an environment based on audio data and image data associated with the environment, the system comprising:
a camera;
a microphone;
a memory including a plurality of sound recognition models; and
an electronic processor configured to
receive, via an input device, a selection of one or more sound recognition tasks;
receive the audio data associated with the environment from the microphone;
receive the image data associated with the environment from the camera;
determine one or more characteristics of the environment based on the audio data and the image data;
for each of the one or more sound recognition tasks selected, select the sound recognition model from the plurality of sound recognition models based on the one or more characteristics of the environment and the selected sound recognition task;
receive additional audio data associated with the environment from the microphone; and
analyze the additional audio data using the sound recognition model to perform a sound recognition task, wherein the sound recognition task includes generating a prediction regarding the additional audio data.
|