US 12,229,595 B2
	Method and system for allocating on-chip memory of neural processing unit
Minhoo Kang, Seongnam-si (KR)
Assigned to REBELLIONS INC., Seongnam-si (KR)
Filed by REBELLIONS INC., Seongnam-si (KR)
Filed on May 23, 2024, as Appl. No. 18/673,214.
Application 18/673,214 is a continuation of application No. 18/389,676, filed on Dec. 19, 2023, granted, now 12,026,552.
Claims priority of application No. 10-2023-0037428 (KR), filed on Mar. 22, 2023.
Prior Publication US 2024/0320044 A1, Sep. 26, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/46 (2006.01); G06F 9/50 (2006.01)

CPC G06F 9/5016 (2013.01) [G06F 9/5022 (2013.01); G06F 9/5044 (2013.01)]

17 Claims

1. A method for allocating on-chip memory of a neural processing unit, the method being performed by one or more processors and comprising:

in an on-chip memory area including a plurality of chunks classified as one of an allocated chunk, a cached chunk, or a free chunk, deallocating an allocated chunk finished with use of the memory and converting the deallocated chunk into the cached chunk;

receiving an on-chip memory allocation request for specific data;

determining, based on a comparison between a size of the specific data and a size of one or more cached chunks, whether there is a cached chunk of the one or more cached chunks that is allocable for the specific data; and

based on a result of determining whether there is the cached chunk that is allocable for the specific data, allocating the specific data to a specific cached chunk of the one or more cached chunks, or allocating the specific data to at least a portion of a classified free chunk,

wherein the one or more cached chunks include a first type cached chunk and a second type cached chunk, and

the first type cached chunk is a type of cached chunk such that the data to be allocated is allocated to the cached chunk if the size of the data to be allocated is the same size of a previously stored data of the cached chunk, and

the second type cached chunk is a type of cached chunk such that the data to be allocated is allocated to the cached chunk if the size of the data to be allocated falls into a predefined range associated with the cached chunk less than the size of the cached chunk, and

the converting the deallocated chunk into the cached chunk includes converting, based on a type of data to which the allocated chunk finished with use of the memory was allocated, the deallocated chunk into the first type cached chunk or the second type cached chunk.