US 11,676,613 B2
Speech coding using auto-regressive generative neural networks
Willem Bastiaan Kleijn, Lower Hutt (NZ); Jan K. Skoglund, San Francisco, CA (US); Alejandro Luebs, San Francisco, CA (US); and Sze Chie Lim, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on May 27, 2021, as Appl. No. 17/332,898.
Application 17/332,898 is a continuation of application No. 16/206,823, filed on Nov. 30, 2018, granted, now 11,024,321.
Prior Publication US 2021/0366495 A1, Nov. 25, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 19/02 (2013.01); G10L 25/30 (2013.01)
CPC G10L 19/0204 (2013.01) [G10L 25/30 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by one or more data processing apparatus on a client device, the method comprising:
obtaining, by the client device, a bitstream of parameters characterizing spoken speech over a data communication network;
generating, by the client device and from the parameters, a conditioning sequence;
generating, by the client device, a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step:
processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the current reconstruction sequence includes the speech samples at each time step preceding the decoder time step, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and
sampling a speech sample from the possible speech sample values as the speech sample at the decoder time step.