| CPC G06F 9/5016 (2013.01) [G06F 9/5022 (2013.01); G06F 9/5044 (2013.01)] | 17 Claims |

|
1. A method for allocating on-chip memory of a neural processing unit, the method being performed by one or more processors and comprising:
in an on-chip memory area including a plurality of chunks classified as one of an allocated chunk, a cached chunk, or a free chunk, deallocating an allocated chunk finished with use of the memory and converting the deallocated chunk into the cached chunk;
receiving an on-chip memory allocation request for specific data;
determining, based on a comparison between a size of the specific data and a size of one or more cached chunks, whether there is a cached chunk of the one or more cached chunks that is allocable for the specific data; and
based on a result of determining whether there is the cached chunk that is allocable for the specific data, allocating the specific data to a specific cached chunk of the one or more cached chunks, or allocating the specific data to at least a portion of a classified free chunk,
wherein the one or more cached chunks include a first type cached chunk and a second type cached chunk, and
the first type cached chunk is a type of cached chunk such that the data to be allocated is allocated to the cached chunk if the size of the data to be allocated is the same size of a previously stored data of the cached chunk, and
the second type cached chunk is a type of cached chunk such that the data to be allocated is allocated to the cached chunk if the size of the data to be allocated falls into a predefined range associated with the cached chunk less than the size of the cached chunk, and
the converting the deallocated chunk into the cached chunk includes converting, based on a type of data to which the allocated chunk finished with use of the memory was allocated, the deallocated chunk into the first type cached chunk or the second type cached chunk.
|