US 11,948,062 B2
	Compressed recurrent neural network models
Ouais Alsharif, Mountain View, CA (US); Rohit Prakash Prabhavalkar, Santa Clara, CA (US); Ian C. McGraw, Menlo Park, CA (US); and Antoine Jean Bruguier, Milpitas, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 4, 2020, as Appl. No. 17/112,966.
Application 17/112,966 is a continuation of application No. 15/394,617, filed on Dec. 29, 2016, granted, now 10,878,319.
Claims priority of provisional application 62/290,624, filed on Feb. 3, 2016.
Prior Publication US 2021/0089916 A1, Mar. 25, 2021
Int. Cl. G06N 3/044 (2023.01); G06N 3/049 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06F 17/16 (2006.01); G06N 3/04 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/044 (2023.01) [G06N 3/049 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G05B 2219/33025 (2013.01); G05B 2219/40326 (2013.01); G06F 17/16 (2013.01); G06N 3/04 (2013.01); G06N 3/084 (2013.01)]

22 Claims

1. A computer-implemented method that, when executed on data processing hardware, causes the data processing hardware to perform operations comprising:

re-configuring an uncompressed version of a trained recurrent neural network (RNN) to generate a compressed version of the trained RNN by replacing a particular uncompressed trained recurrent layer of a plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN with a corresponding compressed trained recurrent layer by:

generating a first compressed weight matrix, Z_h^l, and a projection matrix, P^l, based on an uncompressed recurrent weight matrix, W_h, containing trained recurrent weights for the particular uncompressed trained recurrent layer of the trained RNN;

generating a second compressed weight matrix, Z_x^l, based on the first compressed weight matrix, Z_h^l, and the projection matrix, P^l; and

generating, based on a product of the first compressed weight matrix, Z_h^l, and the projection matrix, P^l, a compressed trained recurrent weight matrix that replaces the uncompressed recurrent weight matrix, W_h, in the corresponding compressed trained recurrent layer, the compressed trained recurrent weight matrix comprising fewer parameters than the uncompressed recurrent weight matrix, W_h; and

transmitting the compressed version of the trained RNN having the the corresponding compressed trained recurrent layer to a device, the compressed version of the trained RNN having the corresponding compressed trained recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps,

wherein, after compressing the particular uncompressed trained recurrent layer of the plurality of uncompressed recurrent layers, the compressed version of the trained RNN comprises fewer parameters than the uncompressed version of the trained RNN.