US 11,798,530 B2
	Simultaneous acoustic event detection across multiple assistant devices
Matthew Sharifi, Kilchberg (CH); and Victor Carbune, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 30, 2020, as Appl. No. 17/85,926.
Prior Publication US 2022/0139371 A1, May 5, 2022
Int. Cl. G10L 15/01 (2013.01); G01S 3/80 (2006.01); G10L 15/08 (2006.01); G10L 15/32 (2013.01); H04R 29/00 (2006.01)

CPC G10L 15/01 (2013.01) [G01S 3/8006 (2013.01); G10L 15/08 (2013.01); G10L 15/32 (2013.01); H04R 29/006 (2013.01); G10L 2015/088 (2013.01)]

16 Claims

1. A method implemented by one or more processors, the method comprising:

detecting, via one or more microphones of an assistant device located in an ecosystem that includes a plurality of assistant devices, audio data that captures an acoustic event, wherein the acoustic event comprises one or more of: glass breaking, a dog barking, a cat meowing, a doorbell ringing, a smoke alarm sounding, a carbon monoxide detector sounding, a baby crying, or a door knocking;

processing, using an acoustic event detection model that is stored locally at the assistant device and that is trained to detect whether one or more particular acoustic events are captured in the audio data, the audio data that captures the acoustic event to generate one or more corresponding measures associated with the acoustic event, wherein the one or more corresponding measures associated with the acoustic event comprise one or more corresponding confidence levels associated with whether the audio data is predicted to capture one or more of the particular acoustic events;

detecting, via one or more additional microphones of an additional assistant device located in the ecosystem, additional audio data that also captures the acoustic event, the additional assistant device being in addition to the assistant device, and the additional assistant device being co-located in the ecosystem with the assistant device;

processing, using an additional acoustic event detection model that is stored locally at the additional assistant device and that is trained to detect whether one or more particular acoustic events are captured in the additional audio data, the additional audio data that captures the acoustic event to generate one or more corresponding additional measures associated with the acoustic event, wherein the one or more corresponding additional measures associated with the acoustic event comprise one or more corresponding additional confidence levels associated with whether the additional audio data is predicted to capture one or more of the particular acoustic events;

determining, based on comparing the one or more confidence levels to a threshold confidence level and based on comparing the one or more additional confidence levels to the threshold confidence level or an additional threshold confidence level, whether the acoustic event detected by at least both the assistant device and the additional assistant device corresponds to an occurrence of an actual acoustic event from among the one or more particular acoustic events; and

in response to determining that the acoustic event corresponds to the occurrence of the actual acoustic event, causing an action associated with the actual acoustic event to be performed.