US 12,487,763 B2
Method and apparatus with memory management and neural network operation
Jiseung Jang, Suwon-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Apr. 29, 2021, as Appl. No. 17/243,991.
Claims priority of application No. 10-2020-0188928 (KR), filed on Dec. 31, 2020.
Prior Publication US 2022/0206698 A1, Jun. 30, 2022
Int. Cl. G06F 3/06 (2006.01); G06N 3/02 (2006.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01)
CPC G06F 3/0638 (2013.01) [G06F 3/0604 (2013.01); G06F 3/0679 (2013.01); G06N 3/02 (2013.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01)] 32 Claims
OG exemplary drawing
 
1. A processor-implemented memory management method, comprising:
controlling a global memory and one or more local memories associated with a device that is configured to perform training operations of a neural network for a training of the neural network, including controlling:
a storing, by the one or more local memories, of a calculated result of a first layer of the neural network in the one or more local memories during a forward propagation operation of the training with respect to training data propagated through an early layer to a later layer of the neural network;
a storing of a calculated gradient of the first layer to the global memory during a backward propagation operation of the training with respect to the training data, where the gradient is calculated based on the stored calculated result;
a deleting of the calculated result of the first layer in the one or more local memories dependent on a progression of the backward propagation operation and before the backward propagation operation has completed with respect to the early layer; and
a storing, by the one or more local memories and dependent on the deleting of the calculated result of the first layer, of a result of one of the performed training operations in the one or more local memories where the calculated result of the first layer was stored to by the storing of the calculated result,
wherein the storing of the calculated gradient of the first layer includes storing the calculated gradient in the global memory before the backward propagation operation has completed with respect to the early layer.