US 11,990,148 B2
	Compressing audio waveforms using neural networks and vector quantizers
Neil Zeghidour, Paris (FR); Marco Tagliasacchi, Kilchberg (CH); and Dominik Roblek, Meilen (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 6, 2023, as Appl. No. 18/106,094.
Application 18/106,094 is a continuation of application No. 17/856,856, filed on Jul. 1, 2022, granted, now 11,600,282.
Claims priority of provisional application 63/218,139, filed on Jul. 2, 2021.
Prior Publication US 2023/0186927 A1, Jun. 15, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 19/038 (2013.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G10L 19/00 (2013.01); G10L 25/30 (2013.01)

CPC G10L 19/038 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G10L 25/30 (2013.01); G10L 2019/0002 (2013.01)]

22 Claims

1. A method performed by one or more computers, the method comprising:

receiving a compressed representation of an audio waveform;

decompressing the compressed representation of the audio waveform to obtain a respective coded representation of each of a plurality of feature vectors representing the audio waveform,

wherein the coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from a respective codebook of each vector quantizer in a sequence of vector quantizers, that define a quantized representation of the feature vector, and

wherein for each of the plurality of feature vectors, the sequence of vector quantizers has generated the coded representation of the feature vector by performing operations comprising:

for a first vector quantizer in the sequence of vector quantizers:

receiving the feature vector;

identifying, based on the feature vector, a respective code vector from the codebook of the vector quantizer to represent the feature vector; and

determining a current residual vector based on an error between: (i) the feature vector, and (ii) the code vector that represents the feature vector;

wherein the coded representation of the feature vector identifies the code vector that represents the feature vector;

generating a respective quantized representation of each feature vector from the coded representation of the feature vector; and

processing the quantized representations of the feature vectors using a decoder neural network to generate an output audio waveform.