US 11,961,007 B2
Split network acceleration architecture
Colin Beaton Verrilli, Apex, NC (US); Rashid Ahmed Akbar Attar, San Diego, CA (US); and Raghavendar Bhavansikar, Erie, CO (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Feb. 5, 2020, as Appl. No. 16/783,047.
Claims priority of provisional application 62/802,150, filed on Feb. 6, 2019.
Prior Publication US 2020/0250545 A1, Aug. 6, 2020
Int. Cl. G06N 3/10 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 5/04 (2023.01)
CPC G06N 3/10 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 5/04 (2013.01); G06N 3/044 (2023.01); G06N 3/063 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for accelerating machine learning on a computing device comprising a first inference accelerator, a second inference accelerator, and a host processor, the method comprising:
implementing, in memory of the host processor, a first request queue associated with the first inference accelerator, the first request queue receiving a pointer to inference data received at the host processor, the inference data being transmitted from the host processor, via a virtual channel, to the first inference accelerator for processing in response to the first request queue receiving the pointer;
implementing, in the memory of the host processor, a completion queue for receiving a final inference request result, via a first direct memory access (DMA) transfer, in response to transmitting the inference data;
hosting a neural network in the first inference accelerator and the second inference accelerator, the neural network split between the first inference accelerator and the second inference accelerator;
routing intermediate inference request results directly between the first inference accelerator and the second inference accelerator, the routing of the intermediate interface request results being performed independently of the host processor;
generating a final inference request result from the intermediate inference request results; and
implementing, in the first inference accelerator or the second inference accelerator, a global synchronization manager (GSM) to route the final inference request result to the host processor, via the first DMA transfer, in accordance with satisfying a pre-synchronization condition.