US 12,112,175 B1
	Method and apparatus for performing machine learning operations in parallel on machine learning hardware
Ulf Hanebutte, Gig Harbor, WA (US); and Avinash Sodani, San Jose, CA (US)
Assigned to Marvell Asia Pte Ltd, Singapore (SG)
Filed by Marvell Asia Pte Ltd, Singapore (SG)
Filed on Feb. 2, 2022, as Appl. No. 17/590,994.
Application 17/590,994 is a continuation in part of application No. 17/511,111, filed on Oct. 26, 2021.
Application 17/590,994 is a continuation in part of application No. 17/248,045, filed on Jan. 6, 2021.
Application 17/511,111 is a continuation of application No. 16/226,508, filed on Dec. 19, 2018, granted, now 11,086,633.
Claims priority of provisional application 63/282,557, filed on Nov. 23, 2021.
Claims priority of provisional application 63/105,861, filed on Oct. 26, 2020.
Claims priority of provisional application 62/675,076, filed on May 22, 2018.
Claims priority of provisional application 62/644,352, filed on Mar. 16, 2018.
Claims priority of provisional application 62/628,130, filed on Feb. 8, 2018.
Int. Cl. G06F 8/41 (2018.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 15/78 (2006.01); G06F 17/16 (2006.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06F 15/80 (2006.01); G06N 5/04 (2023.01); G06N 20/20 (2019.01)

CPC G06F 9/3879 (2013.01) [G06F 8/453 (2013.01); G06F 9/30174 (2013.01); G06F 9/3836 (2013.01); G06F 9/3851 (2013.01); G06F 9/3877 (2013.01); G06F 15/7807 (2013.01); G06F 17/16 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06F 9/3001 (2013.01); G06F 15/7864 (2013.01); G06F 15/8023 (2013.01); G06F 2212/602 (2013.01); G06N 5/04 (2013.01); G06N 20/20 (2019.01)]

15 Claims

1. A computer implemented method, comprising:

receiving a set of data;

dividing the set of data into a plurality of data portions;

transmitting the plurality of data portions to a plurality of processing tiles, wherein each data portion of the plurality of data portions is associated with a processing tile of a plurality of tiles;

performing by each processing tile of the plurality of tiles at least one local operation on its respective data portion to form a local maxima;

exchanging local maximas between the plurality of processing tiles;

calculating a global maximum value based on the local maximas;

performing by each processing tile of the plurality of tiles a subtraction operation of the global maximum value from each data input its respective data portion to form a subtraction result;

performing by each processing tile of the plurality of tiles an exponential operation on the subtraction result to form exponential results;

forming a sum of the exponential results from the plurality of tiles; and

inverting the sum to form a scaled value.