US 12,307,350 B2
Systems and methods for hardware-based pooling
Peter Joseph Bannon, Woodside, CA (US); and Kevin Altair Hurd, Redwood City, CA (US)
Assigned to Tesla, Inc., Austin, TX (US)
Filed by Tesla, Inc., Palo Alto, CA (US)
Filed on Jan. 4, 2018, as Appl. No. 15/862,369.
Prior Publication US 2019/0205738 A1, Jul. 4, 2019
Int. Cl. G06N 3/063 (2023.01); G06F 18/24 (2023.01); G06N 20/00 (2019.01); G06V 10/82 (2022.01); G06V 30/19 (2022.01)
CPC G06N 3/063 (2013.01) [G06F 18/24 (2023.01); G06N 20/00 (2019.01); G06V 10/82 (2022.01); G06V 30/19173 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A pooling unit comprising one or more sub-circuits, wherein the pooling unit is configured to:
reformat input data into a plurality of rows to generate a pooling array, the input data being a linearized array, which includes a plurality of groups of neighborhood values, and a number of the rows corresponding to a size of a pooling kernel, wherein to reformat the pooling unit is configured to shift, over a number of arithmetic cycles, the rows relative to each other to reformat the input data, wherein the input data is received from a matrix processor and reflects an output channel of a convolutional layer, wherein the pooling array is reformatted in grid format such that one or more elements of an individual group are aligned in a horizontal direction and one or more elements of the individual group are aligned in a vertical direction, wherein the rows are shifted according to a shift of a pooling window corresponding to a stride having a value defined by the number of arithmetic cycles and dictating how often pooling values are output,
wherein each row comprises a portion of the input data that correspond to an individual group of neighborhood values of the plurality of groups of neighborhood values, wherein individual pooling operations are applied to individual groups of neighborhood values; and
apply in subsequent arithmetic cycles, according to the reformatted grid format, the individual pooling operations to the individual groups of neighborhood values to obtain a pooling output associated with the output channel, wherein the pooling unit is configured to output a pooling value every number of arithmetic cycles corresponding to the value of the stride.