US 12,346,816 B2
	Neural networks for embedded devices
Forrest Nelson Iandola, San Jose, CA (US); Harsimran Singh Sidhu, Fremont, CA (US); and Yiqi Hou, Berkeley, CA (US)
Assigned to Tesla, Inc., Austin, TX (US)
Filed by Tesla, Inc., Austin, TX (US)
Filed on May 14, 2024, as Appl. No. 18/664,035.
Application 18/664,035 is a continuation of application No. 18/156,628, filed on Jan. 19, 2023, granted, now 11,983,630.
Application 18/156,628 is a continuation of application No. 16/559,483, filed on Sep. 3, 2019, granted, now 11,562,231.
Claims priority of provisional application 62/726,396, filed on Sep. 3, 2018.
Prior Publication US 2024/0296330 A1, Sep. 5, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06F 7/575 (2006.01)

CPC G06N 3/08 (2013.01) [G06F 7/575 (2013.01)]

20 Claims

1. A method of generating a neural network structure including one or more input layers each associated with one or more filters, the method comprising:

determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;

determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;

generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; and

generating the neural network structure with the determined dimensionalities, wherein for at least one layer which forms the neural network structure, an activation is quantized using an activation parameter for the at least one layer and a weight is quantized using a layer parameter for the at least one layer.