US 11,705,106 B2
On-device speech synthesis of textual segments for training of on-device speech recognition model
Françoise Beaufays, Mountain View, CA (US); Johan Schalkwyk, Scarsdale, CA (US); and Khe Chai Sim, Dublin, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 20, 2021, as Appl. No. 17/479,285.
Application 17/479,285 is a continuation of application No. 16/959,546, granted, now 11,127,392, previously published as PCT/US2019/054314, filed on Oct. 2, 2019.
Claims priority of provisional application 62/872,140, filed on Jul. 9, 2019.
Prior Publication US 2022/0005458 A1, Jan. 6, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/047 (2013.01); G10L 15/06 (2013.01)
CPC G10L 13/047 (2013.01) [G10L 15/063 (2013.01); G10L 2015/0635 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A client device comprising:
at least one microphone;
at least one display;
local storage storing a textual segment, an end-to-end speech recognition model, and a speech synthesis model;
one or more processors executing locally stored instructions to cause one or more of the processors to:
identify the textual segment;
generate synthesized speech audio data that includes synthesized speech of the identified textual segment, wherein in generating the synthesized speech audio data one or more of the processors are to process the textual segment using the speech synthesis model;
process, using the end-to-end speech recognition model, the synthesized speech audio data to generate a predicted textual segment;
generate a gradient based on comparing the predicted textual segment to the textual segment; and
update one or more weights of the end-to-end speech recognition model based on the generated gradient.