US 12,333,428 B2
Neural network model processing method and apparatus
Zhicheng Sui, Shenzhen (CN); Li Zhou, Shenzhen (CN); Lei Zhao, Shenzhen (CN); Mohan Liu, Shenzhen (CN); Qinghua Yu, Shanghai (CN); and Hongrui Jiang, Shenzhen (CN)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Appl. No. 17/434,563
Filed by Huawei Technologies Co., Ltd., Shenzhen (CN)
PCT Filed Feb. 27, 2019, PCT No. PCT/CN2019/076374
§ 371(c)(1), (2) Date Aug. 27, 2021,
PCT Pub. No. WO2020/172829, PCT Pub. Date Sep. 3, 2020.
Prior Publication US 2022/0121936 A1, Apr. 21, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A neural network model processing method, the neural network model processing method comprising:
obtaining a first low-bit neural network model through training, wherein the first low-bit neural network model comprises at least three operation layers, wherein the at least three operation layers comprise a first operation layer and a second operation layer, wherein each of the at least three operation layers comprises at least one operation, wherein one or more values of one or more of a parameter or data used for the at least one operation are represented using N bits, and wherein N is a positive integer less than eight;
compressing the first low-bit neural network model to obtain a second low-bit neural network model, wherein the second low-bit neural network model comprises at least two operation layers, wherein the at least two operation layers comprise a third operation layer, wherein the third operation layer is equivalent to a combination of the first operation layer and the second operation layer, and wherein an operation layer other than the third operation layer in the at least two operation layers is the same as an operation layer other than the first operation layer and the second operation layer in the at least three operation layers;
searching the at least three operation layers for the first operation layer and the second operation layer;
combining the first operation layer and the second operation layer to obtain the third operation layer, wherein an input of the first operation layer is the same as an input of the third operation layer, wherein an output of the first operation layer is an input of the second operation layer, and wherein an output of the second operation layer is the same as an output of the third operation layer; and
constructing the second low-bit neural network model based on the third operation layer and the operation layer other than the first operation layer and the second operation layer in the at least three operation layers.