US 12,141,685 B2
	Low power hardware architecture for a convolutional neural network
Jian hui Huang, Los Altos, CA (US); James Michael Bodwin, Cupertino, CA (US); Pradeep R. Joginipally, San Jose, CA (US); Shabarivas Abhiram, Mountain view, CA (US); Gary S. Goldman, Los Altos, CA (US); Martin Stefan Patz, Bavaria (DE); Eugene M. Feinberg, San Jose, CA (US); and Berend Ozceri, Los Gatos, CA (US)
Assigned to Recogni Inc., San Jose, CA (US)
Filed by Recogni Inc., San Jose, CA (US)
Filed on Jan. 11, 2024, as Appl. No. 18/410,736.
Application 18/410,736 is a continuation of application No. 16/948,164, filed on Sep. 4, 2020, granted, now 11,915,126.
Prior Publication US 2024/0143988 A1, May 2, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/04 (2023.01); G06F 7/50 (2006.01); G06F 7/544 (2006.01); G06N 3/0464 (2023.01); G06N 3/06 (2006.01); G06N 3/063 (2023.01)

CPC G06N 3/063 (2013.01) [G06F 7/50 (2013.01); G06F 7/5443 (2013.01); G06N 3/0464 (2023.01)]

5 Claims

1. A method, comprising:

receiving, by a first computing unit, a first plurality of quantized activation values represented by a first plurality of activation mantissa values and a first activation exponent shared by the first plurality of activation mantissa values, wherein the first plurality of quantized activation values is a quantized representation of a first matrix with values

receiving, by the first computing unit, a first quantized convolutional kernel represented by a first plurality of kernel mantissa values and a first kernel exponent shared by the first plurality of kernel mantissa values;

computing, by the first computing unit, a first dot product of the first plurality of activation mantissa values and the first plurality of kernel mantissa values;

computing, by the first computing unit, a first sum of the first shared activation exponent and the first shared kernel exponent;

receiving, by a second computing unit, the first plurality of quantized activation values;

receiving, by the second computing unit, a second quantized convolutional kernel represented by a second plurality of kernel mantissa values and a second kernel exponent shared by the second plurality of kernel mantissa values;

computing, by the second computing unit, a second dot product of the first plurality of activation mantissa values and the second plurality of kernel mantissa values;

computing, by the second computing unit, a second sum of the first shared activation exponent and the second shared kernel exponent;

receiving, by a third computing unit, a second plurality of quantized activation values represented by a second plurality of activation mantissa values and a second activation exponent shared by the second plurality of activation mantissa values, wherein the second plurality of quantized activation values is a quantized representation of a second matrix with values

wherein six of the values of the first matrix are identical to six of the values of the second matrix;

receiving, by the third computing unit, the first quantized convolutional kernel;

computing, by the third computing unit, a third dot product of the second plurality of activation mantissa values and the first plurality of kernel mantissa values;

computing, by the third computing unit, a third sum of the second shared activation exponent and the first shared kernel exponent;

receiving, by a fourth computing unit, the second plurality of quantized activation values;

receiving, by the fourth computing unit, the second quantized convolutional kernel;

computing, by the fourth computing unit, a fourth dot product of the second plurality of activation mantissa values and the second plurality of kernel mantissa values; and

computing, by the fourth computing unit, a fourth sum of the second shared activation exponent and the second shared kernel exponent,

wherein the first plurality of quantized activation values are received by the first and second computing units, but not by the third and fourth computing units,

wherein the second plurality of quantized activation values are received by the third and fourth computing units, but not by the first and second computing units,

wherein the first quantized convolutional kernel is received by the first and third computing units, but not by the second and fourth computing units, and

wherein the second quantized convolutional kernel is received by the second and fourth computing units, but not by the first and third computing units.