US 12,333,351 B2
	Synchronization of processing elements that execute statically scheduled instructions in a machine learning accelerator
Nishit Shah, Sunnyvale, CA (US); Srivathsa Dhruvanarayan, Saratoga, CA (US); and Reed Kotler, San Jose, CA (US)
Assigned to SiMa Technologies, Inc., San Jose, CA (US)
Filed by SiMa Technologies, Inc., San Jose, CA (US)
Filed on Apr. 17, 2020, as Appl. No. 16/852,338.
Prior Publication US 2021/0326189 A1, Oct. 21, 2021
Int. Cl. G06F 9/52 (2006.01); G06F 17/16 (2006.01); G06N 3/04 (2023.01); G06N 3/10 (2006.01)

CPC G06F 9/52 (2013.01) [G06F 17/16 (2013.01); G06N 3/04 (2013.01); G06N 3/10 (2013.01)]

21 Claims

1. A method for implementing a machine learning network (MLN) by executing a computer program of instructions on a machine learning accelerator (MLA) comprising a plurality of interconnected processing elements, the instructions partitioned into one or more non-deterministic phases and one or more deterministic phases, the method comprising:

executing a non-deterministic phase of the instructions;

determining that execution of the non-deterministic phase has completed;

subject to the determination that the execution of the non-deterministic phase has completed, executing a deterministic phase of the instructions; wherein:

execution of the deterministic phase is dependent on completion of the non-deterministic phase,

the instructions in the deterministic phase include both instructions for processing elements to perform computations (compute instructions) and instructions to transfer data between the processing elements (data transfer instructions),

the compute and data transfer instructions in the deterministic phase are executed concurrently by the processing elements according to a static schedule relative to the other compute and data transfer instructions in the deterministic phase with unconditional start times for every compute and data transfer instruction within the deterministic phase, and

the static schedule for the concurrent execution of the compute and data transfer instructions in the deterministic phase is determined by a compiler before run-time and does not depend on run-time conditions, branching or values of inputs to the instructions; and

prior to executing the deterministic phase, synchronizing the plurality of processing elements upon completion of the non-deterministic phase for execution of the statically scheduled instructions.