US 12,412,070 B2
	Neural processing unit that reuses feature maps and its operation method
Dong Hyun Min, Goyang-si (KR); Jung Boo Park, Seoul (KR); Kyeong Han Kim, Yongin-si (KR); Ho Seung Kim, Suwon-si (KR); and Hyun Jin Kim, Hwaseong-si (KR)
Assigned to DEEPX CO., LTD., Seongnam-si (KR)
Filed by DEEPX CO., LTD., Seongnam-si (KR)
Filed on Apr. 15, 2024, as Appl. No. 18/636,196.
Claims priority of application No. 10-2023-0171760 (KR), filed on Nov. 30, 2023.
Prior Publication US 2025/0181884 A1, Jun. 5, 2025
Int. Cl. G06N 3/04 (2023.01); G06F 3/06 (2006.01); G06N 3/063 (2023.01)

CPC G06N 3/04 (2013.01) [G06F 3/0604 (2013.01); G06N 3/063 (2013.01)]

17 Claims

1. A method of reusing feature maps that is operable on a neural processing unit (NPU), the method comprising:

(a) calculating, for a first layer of at least one artificial neural network (ANN) model including a plurality of layers, a space cost based on sizes of an input feature map, an output feature map, and a weight of a subsequent layer of the first layer;

(b) calculating a caching value for an operation of the first layer;

(c) determining a caching profit for the operation of the first layer based on the space cost and the caching value, the caching profit being cumulatively calculated per operation;

(d) selecting at least one caching entry from among caching candidate entries, each of the caching candidate entries corresponding to an the output feature map used for computation in a particular layer among the plurality of layers;

(e) determining a caching entry of the at least one caching entry having maximum caching profit among the at least one output feature map of the caching candidate entries, the caching entry with the maximum caching profit determined based on the cumulatively calculated caching profit; and

(f) storing the caching entry in a variable memory including a plurality of memory units, by selectively storing the caching entry in at least one memory unit of the plurality of memory units to exclude unnecessary data among the at least one cache entry, the selective storing considering each layer between the first layer and a delta-step layer (Δ-step layer), where Δ is an integer greater than or equal to two,

wherein the method is executed by the NPU, the NPU including the at least one ANN model.