| CPC G06V 10/764 (2022.01) [G06V 10/7747 (2022.01); G06V 10/82 (2022.01)] | 9 Claims |

|
1. A neural network model training method for complex characteristic classification and common localization of an image, wherein a neural network model comprises:
a convolution layer configured to perform a convolution operation on an input image by using a convolution filter;
a pooling layer configured to perform pooling on an output of the convolution layer; and
a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified and output values obtained by multiplying an output of the pooling layer by class-specific weights (Wfc(Tt)),
wherein different criteria distinguish each of the plurality of classes,
each of the plurality of classes is classified into a plurality of class-specific characteristics, and
the neural network model is capable of providing class-specific characteristic probabilities for the class-specific characteristics of each of the plurality of classes according to an output of each class-specific fully connected layer,
wherein the neural network model training method comprises:
(a) inputting the input image to the convolution layer;
(b) calculating class-specific observation maps for the plurality of respective classes on the basis of the output of the convolution layer;
(c) calculating an observation loss (Lobs) common to the plurality of classes on the basis of the class-specific observation maps; and
(d) back-propagating a loss based on the observation loss (Lobs) to the neural network model,
wherein step (c) comprises:
(c-1) generating a common observation map common to the plurality of classes on the basis of the class-specific observation maps; and
(c-2) calculating the observation loss (Lobs) by using the common observation map and a target region of the input image, and
wherein each step is performed by a computer processor
wherein the observation loss is calculated by calculating a cosine distance for concatenated values obtained by respectively projecting the common observation map and the target region of the input image in horizontal and vertical directions.
|