CPC G10L 15/16 (2013.01) [G06N 3/08 (2013.01); G06N 20/00 (2019.01); G10L 25/51 (2013.01)] | 17 Claims |
1. An electronic device, comprising:
one or more microphones configured to receive a sound; and
a processor and memory having stored therein a plurality of instructions that when executed by the processor implement:
at least one feature detector configured to receive one or more audio signals from the one or more microphones that comprise the sound, and process the one or more audio signals to i) detect whether a sound source has dynamic location or static location, ii) detect whether the sound source is producing music or speech as a sound class, and iii) determine a third feature, wherein the third feature is whether the sound class is varying between music and speech more frequently over time versus less frequently over time; and
a sound classifier including a machine learning model that is configured to determine whether the sound is generated by a speaker based upon i) the at least one feature detector having detected whether the sound source has dynamic location or static location, ii) the at least one feature detector having detected whether the sound source is producing music or speech, and iii) the third feature.
|