US 11,915,139 B2
	Modifying machine learning models to improve locality
Doe Hyun Yoon, Foster City, CA (US); Nishant Patil, Sunnyvale, CA (US); and Norman Paul Jouppi, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 15, 2022, as Appl. No. 17/672,163.
Application 17/672,163 is a continuation of application No. 16/156,573, filed on Oct. 10, 2018, granted, now 11,263,529.
Prior Publication US 2022/0172060 A1, Jun. 2, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/082 (2023.01); G06N 3/084 (2023.01); G06N 3/10 (2006.01)

CPC G06N 3/082 (2013.01) [G06N 3/084 (2013.01); G06N 3/10 (2013.01)]

20 Claims

1. A method for improving locality of machine learning models, the method performed by data processing apparatus, the method comprising:

receiving data of a machine learning model, the data representing operations of the machine learning model;

receiving data specifying characteristics of a memory hierarchy for one or more machine learning processors on which the machine learning model is going to be deployed, the memory hierarchy including multiple memories for storing machine learning data used by the one or more machine learning processors when performing machine learning computations using the machine learning model, the characteristics including a data storage capacity of each memory and a memory bandwidth of each memory, wherein at least one of the memories has a different memory bandwidth than at least one other memory;

generating, based on the data of the machine learning model and the characteristics of the memory hierarchy, an updated machine learning model, the generating comprising:

determining that output data of a given operation of the machine learning model should be stored in a highest bandwidth memory of the multiple memories based on the machine learning model;

determining that the output data of the given operation has a data size that is larger than a data storage capacity of the highest bandwidth memory; and

in response to determining that the output data of the given operation has the data size that is larger than the data storage capacity of the highest bandwidth memory adding, to the updated machine learning model, one or more operations for splitting the output data into multiple portions of output data such that each portion of output data has a data size that is less than or equal to the data storage capacity of the highest bandwidth memory; and

performing machine learning computations using the updated machine learning model.