US 12,254,400 B2
Optimizing artificial neural network computations based on automatic determination of a batch size
Benoit Chappet de Vangel, Paris (FR); Thomas Cagnac, Longpont sur Orge (FR); Benjamin Poumarede, Paris (FR); and Ludovic Larzul, El Dorado Hills, CA (US)
Assigned to Mipsology SAS, Palaiseau (FR)
Filed by Mipsology SAS, Palaiseau (FR)
Filed on Jan. 10, 2019, as Appl. No. 16/244,267.
Prior Publication US 2020/0226458 A1, Jul. 16, 2020
Int. Cl. G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) 20 Claims
OG exemplary drawing
 
1. A system comprising:
a computation circuit configured to perform computations of one or more layers of an artificial neural network (ANN) for a series of input datasets; and
one or more processors in communication with the computation circuit, wherein the one or more processors are configured to initiate operations including:
receiving an ANN structure and a user input, the ANN structure being associated with the ANN and the user input including a specified performance measure for the ANN; and
generating, based on the ANN structure and the user input, a configuration for the computation circuit and a memory configuration associated with the ANN, the memory configuration including a plurality of memories allocated to storing data associated with the one or more layers of the ANN, wherein the configuration corresponds to the specified performance measure and includes information concerning batch sizes of the one or more layers of the ANN, and wherein:
the one or more layers of the ANN are assigned the batch sizes based on the configuration;
the ANN includes a first layer, a second layer, and a third layer, wherein an output of the first layer is an input to the second layer and the third layer; and
the computation circuit is configured to:
determine a first latency of storing the data associated with the one or more layers in the plurality of memories when a computation of the first layer is executed prior to a computation of the second layer;
determine a second latency of storing the data associated with the one or more layers in the plurality of memories when the computation of the first layer is executed after the computation of the second layer;
determine that the second latency is less than the first latency; and
in response to the determination that the second latency is less than the first latency, perform at least one computation of the first layer before a computation of the second layer and at least one further computation of the first layer after the computation of the second layer and prior to a computation of the third layer.