US 12,229,656 B2
Method and apparatus for performing convolution operation for optimizing arithmetic intensity of device
Shin Kook Choi, Seoul (KR); and Jun Kyeong Choi, Seoul (KR)
Assigned to NOTA, INC., Daejeon (KR)
Filed by NOTA, INC., Daejeon (KR)
Filed on Jul. 19, 2023, as Appl. No. 18/355,008.
Claims priority of application No. 10-2022-0089835 (KR), filed on Jul. 20, 2022; and application No. 10-2022-0157421 (KR), filed on Nov. 22, 2022.
Prior Publication US 2024/0028875 A1, Jan. 25, 2024
Int. Cl. G06N 3/0464 (2023.01)
CPC G06N 3/0464 (2023.01) 20 Claims
OG exemplary drawing
 
1. A method of performing a convolution operation in a neural network, which is performed by an apparatus, the method comprising:
receiving input data; and
generating output data through a convolution model of the neural network for a device, by using the input data,
wherein the convolution model comprises a plurality of convolutional layers, a first reshape layer,
wherein the plurality of convolutional layers comprise an improved first convolutional layer that satisfies preset conditions related to latency characteristics of the device,
wherein the improved first convolutional layer is a layer obtained by-modifying a first convolutional layer, which performs a first convolution operation on an input feature map, using a base filter, to include a modified filter modified from the base filter of the first convolutional layer based on a predetermined final division value, and to perform a second convolution operation on a first feature map, which is modified by dividing the input feature map based on a predetermined final division value,
wherein the first reshape layer is positioned before the improved first convolution layer and modifies the input feature map to generate the first feature map having an increased spatial size and a reduced number of input channels according to the final division value, compared to the input feature map, wherein the improved first convolutional layer performs the second convolution operation on the first feature map, using the modified filter, sharing weights of the modified filter over the first feature map, and
wherein the final division value is determined by considering the latency characteristics such that an overall arithmetic intensity of the improved first convolutional layer is increased compared to the first convolutional layer.