US 11,869,232 B2
Deep learning inference efficiency technology with early exit and speculative execution
Haim Barad, Zichron Yaakov (IL); Barak Hurwitz, Kibbutz Alonim (IL); Uzi Sarel, Zichron-Yaakov (IL); Eran Geva, Haifa (IL); Eli Kfir, Yakir (IL); and Moshe Island, Tel Mond (IL)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jan. 9, 2023, as Appl. No. 18/151,914.
Application 18/151,914 is a continuation of application No. 16/266,880, filed on Feb. 4, 2019, granted, now 11,562,200.
Prior Publication US 2023/0215158 A1, Jul. 6, 2023
Int. Cl. G06V 10/82 (2022.01); G06N 3/04 (2023.01); G06F 30/33 (2020.01); G06V 10/44 (2022.01); G06V 10/94 (2022.01)
CPC G06V 10/82 (2022.01) [G06F 30/33 (2020.01); G06N 3/04 (2013.01); G06V 10/454 (2022.01); G06V 10/955 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving an input for an inference to be performed by a neural network, the input comprising a first batch of one or more samples and a second batch of one or more samples, the neural network comprising a first subset of one or more layers and a second subset of one or more layers;
generating an intermediate output by processing the first batch of one or more samples in the first subset of one or more layers;
determining whether the intermediate output satisfies one or more exit criteria;
determining a position of an intermediate exit based on estimated memory consumption by at least one layer in the first subset of one or more layers, wherein the position is between the first subset of one or more layers and the second subset of one or more layers;
placing the intermediate exit at the position between the first subset of one or more layers and the second subset of one or more layers;
after determining that the intermediate output satisfies the one or more exit criteria, causing the first batch of one or more samples to exit the neural network at the intermediate exit;
causing the second batch of one or more samples to be processed in the first subset of one or more layers; and
causing the second batch of one or more samples to be processed in the second subset of one or more layers.