US 11,682,385 B2
	End-to-end streaming keyword spotting
Raziel Alvarez Guevara, Menlo Park, CA (US); Hyun Jin Park, Mountain View, CA (US); and Patrick Violette, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jun. 15, 2021, as Appl. No. 17/348,422.
Application 17/348,422 is a continuation of application No. 16/709,191, filed on Dec. 10, 2019, granted, now 11,056,101.
Application 16/709,191 is a continuation in part of application No. 16/439,897, filed on Jun. 13, 2019, granted, now 10,930,269, issued on Feb. 23, 2021.
Claims priority of provisional application 62/697,586, filed on Jul. 13, 2018.
Prior Publication US 2021/0312913 A1, Oct. 7, 2021
Int. Cl. G10L 15/16 (2006.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01); G10L 15/08 (2006.01)

CPC G10L 15/16 (2013.01) [G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 2015/025 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations for training an end-to-end keyword spotting model, the operations comprising:

receiving a training input audio sequence that contains a keyword;

generating a plurality of sequential encoder windows over an expected location of the keyword contained in the training input audio sequence;

generating a decoder window in a time interval that includes an endpoint of the hotword;

for each encoder window in the plurality of sequential encoder windows, determining a max pooling loss at the corresponding encoder window;

determining a max pooling loss for the decoder window; and

optimizing the end-to-end keyword spotting model based on the max pooling losses determined for the plurality of sequential encoder windows and the max pooling loss determined for the decoder window.