CPC G06V 10/82 (2022.01) [G06F 30/33 (2020.01); G06N 3/04 (2013.01); G06V 10/454 (2022.01); G06V 10/955 (2022.01)] | 20 Claims |
1. A method comprising:
receiving an input for an inference to be performed by a neural network, the input comprising a first batch of one or more samples and a second batch of one or more samples, the neural network comprising a first subset of one or more layers and a second subset of one or more layers;
generating an intermediate output by processing the first batch of one or more samples in the first subset of one or more layers;
determining whether the intermediate output satisfies one or more exit criteria;
determining a position of an intermediate exit based on estimated memory consumption by at least one layer in the first subset of one or more layers, wherein the position is between the first subset of one or more layers and the second subset of one or more layers;
placing the intermediate exit at the position between the first subset of one or more layers and the second subset of one or more layers;
after determining that the intermediate output satisfies the one or more exit criteria, causing the first batch of one or more samples to exit the neural network at the intermediate exit;
causing the second batch of one or more samples to be processed in the first subset of one or more layers; and
causing the second batch of one or more samples to be processed in the second subset of one or more layers.
|