US 11,657,282 B2
Efficient inferencing with fast pointwise convolution
Jamie Menjay Lin, San Diego, CA (US); Yang Yang, San Diego, CA (US); and Jilei Hou, San Diego, CA (US)
Assigned to Qualcomm Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Sep. 16, 2019, as Appl. No. 16/571,760.
Prior Publication US 2021/0081765 A1, Mar. 18, 2021
Int. Cl. G06N 3/04 (2023.01); G06F 17/15 (2006.01); G06F 17/16 (2006.01); G06K 9/62 (2022.01); G06N 3/08 (2023.01); G06T 7/70 (2017.01); G06V 40/10 (2022.01); G06N 3/084 (2023.01)
CPC G06N 3/0472 (2013.01) [G06F 17/15 (2013.01); G06F 17/16 (2013.01); G06K 9/6262 (2013.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06K 9/6267 (2013.01); G06T 7/70 (2017.01); G06T 2207/20084 (2013.01); G06V 40/10 (2022.01)] 27 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving input data at a convolutional neural network (CNN) model;
generating a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
setting a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
performing fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
providing the fast pointwise convolution output to the second layer of the CNN model.
 
10. A processing system, comprising:
a memory comprising computer-executable instructions; and
a first processor configured to execute the computer-executable instructions and cause the processing system to:
receive input data at a convolutional neural network (CNN) model;
generate a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
set a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
perform fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
provide the fast pointwise convolution output to the second layer of the CNN model.
 
19. A non-transitory computer-readable medium comprising instructions that, when executed by a first processor of a processing system, cause the processing system to perform a method, the method comprising:
receiving input data at a convolutional neural network (CNN) model;
generating a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
setting a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
performing fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
providing the fast pointwise convolution output to the second layer of the CNN model.