US 11,704,555 B2
Batch normalization layer fusion and quantization method for model inference in AI neural network engine
Min Guo, Sunnyvale, CA (US)
Assigned to BAIDU USA LLC, Sunnyvale, CA (US)
Filed by Baidu USA LLC, Sunnyvale, CA (US)
Filed on Jun. 24, 2019, as Appl. No. 16/450,716.
Prior Publication US 2020/0401884 A1, Dec. 24, 2020
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01); G06N 20/10 (2019.01); G06N 3/063 (2023.01); G06N 3/048 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01); G06N 3/048 (2023.01); G06N 3/063 (2013.01); G06N 20/10 (2019.01); G06N 3/045 (2023.01)] 18 Claims
OG exemplary drawing
 
13. A data processor, comprising:
one or more memories to receive and store input data; and
a processing core coupled to the one or more memories to classify the input data using a neural network (NN) having a plurality of NN layers, wherein each of the NN layers includes a merged batch normalization (BN) transform and convolutional (CONV) kernel computation layer using a set of merged BN and CONV parameters, wherein the processor core is to
for each of a plurality of NN layers of the NN,
form the merged BN transform and CONV kernel computation layer to compute merged BN layer and CONV layer functions using the set of merged BN and CONV parameters; and
merge a rectified linear unit (RELU) layer function with the merged BN layer and CNN layer functions to form a merged BN/CONV/RELU layer.