US 12,112,174 B2
	Streaming engine for machine learning architecture
Avinash Sodani, San Jose, CA (US); Ulf Hanebutte, Gig Harbor, WA (US); Senad Durakovic, Palo Alto, CA (US); Hamid Reza Ghasemi, Sunnyvale, CA (US); and Chia-Hsin Chen, Santa Clara, CA (US)
Assigned to Marvell Asia Pte Ltd, Singapore (SG)
Filed by Cavium, LLC, Santa Clara, CA (US)
Filed on Dec. 19, 2018, as Appl. No. 16/226,534.
Claims priority of provisional application 62/675,076, filed on May 22, 2018.
Claims priority of provisional application 62/644,352, filed on Mar. 16, 2018.
Claims priority of provisional application 62/628,130, filed on Feb. 8, 2018.
Prior Publication US 2019/0244117 A1, Aug. 8, 2019
Int. Cl. G06N 20/10 (2019.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 15/78 (2006.01); G06F 17/16 (2006.01); G06N 20/00 (2019.01); G06F 15/80 (2006.01); G06N 5/04 (2023.01); G06N 20/20 (2019.01)

CPC G06F 9/3879 (2013.01) [G06F 9/30174 (2013.01); G06F 9/3836 (2013.01); G06F 9/3851 (2013.01); G06F 9/3877 (2013.01); G06F 15/7807 (2013.01); G06F 17/16 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06F 9/3001 (2013.01); G06F 15/7864 (2013.01); G06F 15/8023 (2013.01); G06F 2212/602 (2013.01); G06N 5/04 (2013.01); G06N 20/20 (2019.01)]

29 Claims

1. A programmable hardware system for machine learning (ML), comprising:

a core configured to

receive a plurality of commands and data from a host to be analyzed and inferred via machine learning;

divide the plurality of commands into a first subset of commands associated with performance-critical operations and a second subset of commands associated with performance-noncritical operations, wherein the performance-critical operations include at least one or more of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of any of a tree, a graph, and a priority queue, wherein the performance-noncritical operations include at least one or more of data collection and data mapping, wherein the performance-critical operations exclude any of data collection and data mapping, wherein the performance-noncritical operations exclude any of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of a tree, a graph, and a priority queue;

transmit each command of the first subset of commands of the plurality of commands for performance-critical operations and associated data thereof to an inference engine for processing via a function call, wherein the each command of the first subset of commands and/or the associated data are encapsulated as parameters in the function call,

wherein the second subset of commands associated with performance-noncritical operations is not transmitted to the inference engine;

an instruction streaming engine coupled to the core and further coupled to the inference engine, wherein the streaming engine is configured to

retrieve and maintain the each command of the first subset of commands and/or the associated data from the function call at a specific location in a buffer;

stream the each command of the first subset of commands and/or its associated data to the inference engine from the buffer; and

said inference engine configured to

retrieve the each command of the first subset of commands and/or its associated data streamed from the buffer;

perform the performance-critical operations according to the each command of the first subset of commands;

analyze the data; and

infer a subject from the data.