US 11,704,548 B2
Multicast network and memory transfer optimizations for neural network hardware acceleration
Jeremy Bruestle, Seattle, WA (US); and Choong Ng, Seattle, WA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 10, 2021, as Appl. No. 17/444,752.
Application 17/444,752 is a division of application No. 15/588,569, filed on May 5, 2017, granted, now 11,120,329.
Claims priority of provisional application 62/333,214, filed on May 7, 2016.
Prior Publication US 2021/0374512 A1, Dec. 2, 2021
Int. Cl. G06N 3/063 (2023.01); G06F 12/06 (2006.01); G06F 9/345 (2018.01); H04L 49/15 (2022.01); G06N 3/04 (2023.01); H04L 15/00 (2006.01)
CPC G06N 3/063 (2013.01) [G06F 9/345 (2013.01); G06F 12/06 (2013.01); G06N 3/04 (2013.01); H04L 49/1507 (2013.01); H04L 15/00 (2013.01)] 17 Claims
OG exemplary drawing
 
15. A method comprising:
receiving at a modulo address generator, a memory address and a length;
generating by the modulo address generator, a set of memory addresses corresponding to data elements stored in a computer readable memory separated by strides;
receiving at an enqueuing controller, the generated set of memory addresses;
queuing by the enqueuing controller addresses of data elements into corresponding address queues respectively, and concurrently queuing by the enqueuing controller control data in a control queue;
decoding the memory addresses of the data elements into data at a decoder and queuing into a corresponding data queue respectively;
at a dequeuing controller, receiving data elements from the data queues and forwarding the received data elements to a reverse permutation, based at least on the control data received from the control queue;
restoring via the reverse permutation, the data elements received from the dequeuing controller; and
forwarding at the reverse permutation, the restored data elements to an output.