US 11,983,630 B2
	Neural networks for embedded devices
Forrest Nelson Iandola, San Jose, CA (US); Harsimran Singh Sidhu, Fremont, CA (US); and Yiqi Hou, Berkeley, CA (US)
Assigned to Tesla, Inc., Austin, TX (US)
Filed by Tesla, Inc., Austin, TX (US)
Filed on Jan. 19, 2023, as Appl. No. 18/156,628.
Application 18/156,628 is a continuation of application No. 16/559,483, filed on Sep. 3, 2019, granted, now 11,562,231.
Claims priority of provisional application 62/726,396, filed on Sep. 3, 2018.
Prior Publication US 2023/0237331 A1, Jul. 27, 2023
Int. Cl. G06N 3/08 (2023.01); G06F 7/575 (2006.01)

CPC G06N 3/08 (2013.01) [G06F 7/575 (2013.01)]

20 Claims

1. A method of generating a neural network structure including one or more input layers each associated with one or more filters, the method comprising:

determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;

determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;

generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; and

generating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.