CPC G06N 3/0472 (2013.01) [G06F 17/15 (2013.01); G06F 17/16 (2013.01); G06K 9/6262 (2013.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06K 9/6267 (2013.01); G06T 7/70 (2017.01); G06T 2207/20084 (2013.01); G06V 40/10 (2022.01)]  27 Claims 
1. A method, comprising:
receiving input data at a convolutional neural network (CNN) model;
generating a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
setting a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
performing fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
providing the fast pointwise convolution output to the second layer of the CNN model.

10. A processing system, comprising:
a memory comprising computerexecutable instructions; and
a first processor configured to execute the computerexecutable instructions and cause the processing system to:
receive input data at a convolutional neural network (CNN) model;
generate a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
set a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
perform fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
provide the fast pointwise convolution output to the second layer of the CNN model.

19. A nontransitory computerreadable medium comprising instructions that, when executed by a first processor of a processing system, cause the processing system to perform a method, the method comprising:
receiving input data at a convolutional neural network (CNN) model;
generating a factorized computation network comprising a first plurality of connections between a first layer of the CNN model and a second layer of the CNN model, wherein:
the factorized computation network comprises N inputs,
the factorized computation network comprises M outputs, and
the factorized computation network comprises at least one path from every input of the N inputs to every output of the M outputs;
setting a connection weight for each connection of a second plurality of connections in the factorized computation network to 1 so that a weight density for the factorized computation network is <100%;
performing fast pointwise convolution using the factorized computation network to generate fast pointwise convolution output; and
providing the fast pointwise convolution output to the second layer of the CNN model.
