US 11,869,530 B2
Generating audio using neural networks
Aaron Gerard Antonius van den Oord, London (GB); Sander Etienne Lea Dieleman, London (GB); Nal Emmerich Kalchbrenner, North Holland (NL); Karen Simonyan, London (GB); and Oriol Vinyals, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Jun. 13, 2022, as Appl. No. 17/838,985.
Application 17/838,985 is a continuation of application No. 17/020,348, filed on Sep. 14, 2020, granted, now 11,386,914.
Application 17/020,348 is a continuation of application No. 16/390,549, filed on Apr. 22, 2019, granted, now 10,803,884, issued on Oct. 13, 2020.
Application 16/390,549 is a continuation of application No. 16/030,742, filed on Jul. 9, 2018, granted, now 10,304,477, issued on May 28, 2019.
Application 16/030,742 is a continuation of application No. PCT/US2017/050320, filed on Sep. 6, 2017.
Claims priority of provisional application 62/384,115, filed on Sep. 6, 2016.
Prior Publication US 2022/0319533 A1, Oct. 6, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 25/30 (2013.01); G10L 13/06 (2013.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06N 3/04 (2023.01)
CPC G10L 25/30 (2013.01) [G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G10L 13/06 (2013.01); G10H 2250/311 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A neural network system implemented by one or more computers,
wherein the neural network system is configured to autoregressively generate an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps, and
wherein the neural network system comprises:
a convolutional subnetwork comprising one or more audio-processing convolutional neural network layers, wherein the convolutional subnetwork is configured to, for each of the plurality of time steps:
receive a current sequence of audio data that comprises the respective audio sample at each of multiple time steps that precede the time step in the output sequence, and
process the current sequence of audio data to generate an alternative representation for the time step;
wherein the neural network system is configured to process the alternative representations for the time steps to generate the output sequence of audio data.