CPC G06N 3/08 (2013.01) [G06F 17/16 (2013.01); G06N 3/02 (2013.01); G06V 10/82 (2022.01); G06T 2207/20 (2013.01)] | 18 Claims |
1. A computer-implemented method for processing a neural network associated with an input matrix having a first number of elements, comprising:
dividing the input matrix into a plurality of vectors, each vector having a second number of elements;
grouping the plurality of vectors into a first group of vectors and a second group of vectors;
assigning vectors in the first group to a plurality of buckets according to a position of a key element in each vector of the first group;
pruning the first group of vectors and the second group of vectors; and
executing the neural network using the first group of pruned vectors and the second group of pruned vectors, wherein the neural network is executed using the first group of pruned vectors in parallel threads corresponding to the plurality of buckets,
wherein grouping the plurality of vectors into the first group of vectors and the second group of vectors further comprises:
determining a pruning ratio for the input matrix;
determining parameters of the first group of vectors and the second group of vectors based on the first number, the second number, and the pruning ratio; and
grouping the plurality of vectors into the first group of vectors and the second group of vectors based on the determined parameters,
the parameters comprising at least one of a bucket size for the plurality of buckets in the first group, a number of empty vectors in the first group, a size of the second group, or a number of non-zero elements to be retained in the second group.
|