US 12,073,319 B2
	Sound model localization within an environment
Rajeev Conrad Nongpiur, Mountain View, CA (US); Byungchul Kim, Los Altos, CA (US); Marie Vachovsky, San Francisco, CA (US); Monica Song, La Canada Flintridge, CA (US); Khe Chai Sim, Dublin, CA (US); and Qian Zhang, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Jul. 27, 2020, as Appl. No. 16/940,294.
Prior Publication US 2022/0027725 A1, Jan. 27, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/047 (2023.01); G10L 25/51 (2013.01)

CPC G06N 3/08 (2013.01) [G06N 3/047 (2023.01); G10L 25/51 (2013.01)]

20 Claims

1. A computer-implemented method performed by a data processing apparatus, the method comprising:

receiving, on a computing device in an environment, from devices in the environment, sound recordings made of sounds in the environment;

determining, by the computing device, preliminary labels for the sound recordings using pre-trained sound models, wherein each of the preliminary labels has an associated probability;

generating, by the computing device, sound clips with preliminary labels based on the sound recordings that have determined preliminary labels whose associated probability is over a high-recall threshold for the one of the pre-trained sound models that determined the preliminary label;

sending, by the computing device, the sound clips with preliminary labels to a user device;

presenting, by the user device, the sounds clips with the preliminary labels, wherein:

a first sound clip is selected by a user as matching its preliminary label; and

a second sound clip is provided a second label, differing from its preliminary label, by the user;

receiving, by the computing device, labels for the sound clips from the user device, wherein:

a first label for the first sound clip matches its preliminary label; and

the second label for the second sound clip differs from its preliminary label;

generating, by the computing device, training data sets for the pre-trained sound models using the labeled sound clips; and

training the pre-trained sound models using the training data sets to generate localized sound models.