US 11,915,147 B2
	Large model support in deep learning
Minsik Cho, Austin, TX (US); Ulrich Alfons Finkler, Mahopac, NY (US); Vladimir Zolotov, Putnam Valley, NY (US); and David S. Kung, Chappaqua, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 20, 2022, as Appl. No. 18/048,203.
Application 18/048,203 is a continuation of application No. 16/180,864, filed on Nov. 5, 2018, granted, now 11,526,759.
Prior Publication US 2023/0064057 A1, Mar. 2, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 15/82 (2006.01); G06N 3/084 (2023.01); G06F 13/42 (2006.01); G06N 3/04 (2023.01)

CPC G06N 3/084 (2013.01) [G06F 13/4282 (2013.01); G06N 3/04 (2013.01)]

20 Claims

1. A graphics processing unit, comprising:

a graphics processing unit cache memory,

wherein the graphics processing unit is communicatively coupled to a central processing unit comprising a central processing unit cache memory,

wherein the graphics processing unit, during a forward pass process of training a deep neural network that traverses through a set of layers of the deep neural network from a first layer of the set of layers to a last layer of the set of layers, transmits, to the central processing unit for storage in the central processing unit cache memory, data from the graphics processing unit cache memory employed for the training by an intermediate layer of the set of layers between the first layer and the last layer, and

wherein the graphics processing unit has determined that at least a portion of the data will be employed by the intermediate layer during a backward pass process of training the deep neural network that traverses from the last layer to the first layer.