US 12,353,983 B2
	Inference device and method for reducing the memory usage in a weight matrix
Kenya Sugihara, Tokyo (JP)
Assigned to MITSUBISHI ELECTRIC CORPORATION, Tokyo (JP)
Filed by Mitsubishi Electric Corporation, Tokyo (JP)
Filed on Jun. 25, 2021, as Appl. No. 17/358,167.
Application 17/358,167 is a continuation of application No. PCT/JP2019/000612, filed on Jan. 11, 2019.
Prior Publication US 2021/0319299 A1, Oct. 14, 2021
Int. Cl. G06N 3/063 (2023.01); G06F 7/523 (2006.01); G06F 7/78 (2006.01)

CPC G06N 3/063 (2013.01) [G06F 7/523 (2013.01); G06F 7/78 (2013.01)]

5 Claims

1. An inference device comprising:

a memory to store a layer on an input side and weight data for generating a matrix multiplication using the layer on the input side; and

a processor to generate a layer on an output side, by using the layer on the input side and the weight data for generating the matrix multiplication using the layer on the input side,

wherein the inference device uses a neural network, and in the neural network, out of a plurality of rows and columns including zero elements and non-zero elements in the weight data, the inference device reduces an amount of memory in the memory for storing learned weight data of the matrix by:

storing only weights that give the non-zero elements and position information of the weights that give the non-zero elements in the memory,

setting approximately a same number of the non-zero elements in each of the rows, and

setting approximately the same number of the non-zero elements in each of the columns, and

in a case where layers of convolutional filters of a convolutional neural network (CNN) correspond to the layer on the input side, a matrix is obtained, in which a height, a width, and input channels are included in one row and the number of output channels is a number of the rows, and the weight data is stored in the memory, as the weights that give the non-zero elements out of the plurality of rows and columns to be multiplied with the matrix.