US 12,067,465 B2
	Instruction streaming for a machine learning accelerator
Subba Rao Venkata Kalari, Cupertino, CA (US)
Assigned to SiMa Technologies, Inc., San Jose, CA (US)
Filed by SiMa Technologies, Inc., San Jose, CA (US)
Filed on Dec. 17, 2020, as Appl. No. 17/125,993.
Prior Publication US 2022/0198318 A1, Jun. 23, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 9/38 (2018.01); G06F 9/48 (2006.01); G06F 9/50 (2006.01)

CPC G06N 20/00 (2019.01) [G06F 9/3836 (2013.01); G06F 9/4881 (2013.01); G06F 9/5027 (2013.01); G06F 9/5061 (2013.01); G06F 9/5066 (2013.01)]

20 Claims

1. A method for implementing a machine learning network by executing a computer program of instructions on a machine learning accelerator (MLA) comprising a plurality of interconnected storage elements (SEs) and processing elements (PEs), the instructions partitioned into blocks, the method comprising:

retrieving a block k of instructions from off-chip memory, the block (a) comprising a set of statically scheduled deterministic instructions executed by the SEs and PEs, and (b) specifying a number Nk of non-deterministic instructions for block k that must execute prior to executing the block k of instructions, wherein the non-deterministic instructions are contained in prior blocks;

keeping a count of the number of non-deterministic instructions for block k executed; and

executing the block k of instructions, only after the count of executed non-deterministic instructions for block k has reached Nk, wherein an execution order of the statically scheduled deterministic instructions in the block k does not change as a result of run-time conditions, branching or dependence on input values.