CPC G10L 19/0204 (2013.01) [G10L 25/30 (2013.01)] | 20 Claims |
1. A method performed by one or more data processing apparatus on a client device, the method comprising:
obtaining, by the client device, a bitstream of parameters characterizing spoken speech over a data communication network;
generating, by the client device and from the parameters, a conditioning sequence;
generating, by the client device, a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step:
processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the current reconstruction sequence includes the speech samples at each time step preceding the decoder time step, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and
sampling a speech sample from the possible speech sample values as the speech sample at the decoder time step.
|