US 12,079,722 B2
	Method and device for optimizing neural network
Yuwei Hu, Beijing (CN); Jiangming Jin, Beijing (CN); Lei Su, Beijing (CN); and Dinghua Li, Beijing (CN)
Assigned to Beijing Tusen Zhitu Technology Co., Ltd., Beijing (CN)
Filed by TuSimple, Inc., San Diego, CA (US); and Beijing Tusen Zhitu Technology Co., Ltd., Beijing (CN)
Filed on Feb. 1, 2023, as Appl. No. 18/162,871.
Application 18/162,871 is a continuation of application No. 16/014,869, filed on Jun. 21, 2018, granted, now 11,580,377.
Claims priority of application No. 201710474712.0 (CN), filed on Jun. 21, 2017.
Prior Publication US 2023/0177336 A1, Jun. 8, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 12/02 (2006.01); G06F 17/15 (2006.01); G06F 17/16 (2006.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 20/10 (2019.01); H03M 7/30 (2006.01)

CPC G06N 3/08 (2013.01) [G06F 12/0207 (2013.01); G06F 17/153 (2013.01); G06F 17/16 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 20/10 (2019.01); H03M 7/30 (2013.01)]

22 Claims

1. A method for optimizing a neural network that includes a convolution layer, implemented by a device comprising at least one processor and memory, the method comprising:

allocating, by the at least one processor, from the memory, a memory space;

receiving, by the at least one processor, input data of the convolution layer;

compressing, by the at least one processor and along a channel direction of the convolution layer, the input data into compressed input data of the convolution layer;

compressing, by the at least one processor and along the channel direction of the convolution layer, convolution kernels of the convolution layer into a compressed convolution kernel;

dividing, by the at least one processor, the compressed input data of the convolution layer into blocks of the compressed input data of a same size as one of the compressed convolution kernels;

generating, by the at least one processor, output data of the convolution layer by performing convolution computation on the blocks of the compressed input data and the compressed convolution kernels; and

storing the output data of the convolution layer into the memory space, wherein

the output data of the convolution layer includes multiple elements each of which corresponds to one of the blocks of the compressed input data,

each of the multiple elements is obtained by performing the convolution computation on one of the blocks of the compressed input data using the compressed convolution kernels, and

the storing the output data of the convolution layer comprises storing respectively the multiple elements of the output data of the convolution layer into the memory space.