US 12,260,614 B2
Neural network model training method and apparatus for complex characteristic classification and common localization
Jisoo Keum, Gyeonggi-do (KR); Sangil Oh, Seoul (KR); and Kyungnam Kim, Gyeonggi-do (KR)
Assigned to WAYCEN INC., Seoul (KR)
Appl. No. 17/777,246
Filed by WAYCEN INC., Seoul (KR)
PCT Filed Jul. 29, 2021, PCT No. PCT/KR2021/009939
§ 371(c)(1), (2) Date May 16, 2022,
PCT Pub. No. WO2022/025690, PCT Pub. Date Feb. 3, 2022.
Claims priority of application No. 10-2020-0095773 (KR), filed on Jul. 31, 2020.
Prior Publication US 2022/0406035 A1, Dec. 22, 2022
Int. Cl. G06V 10/764 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/764 (2022.01) [G06V 10/7747 (2022.01); G06V 10/82 (2022.01)] 9 Claims
OG exemplary drawing
 
1. A neural network model training method for complex characteristic classification and common localization of an image, wherein a neural network model comprises:
a convolution layer configured to perform a convolution operation on an input image by using a convolution filter;
a pooling layer configured to perform pooling on an output of the convolution layer; and
a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified and output values obtained by multiplying an output of the pooling layer by class-specific weights (Wfc(Tt)),
wherein different criteria distinguish each of the plurality of classes,
each of the plurality of classes is classified into a plurality of class-specific characteristics, and
the neural network model is capable of providing class-specific characteristic probabilities for the class-specific characteristics of each of the plurality of classes according to an output of each class-specific fully connected layer,
wherein the neural network model training method comprises:
(a) inputting the input image to the convolution layer;
(b) calculating class-specific observation maps for the plurality of respective classes on the basis of the output of the convolution layer;
(c) calculating an observation loss (Lobs) common to the plurality of classes on the basis of the class-specific observation maps; and
(d) back-propagating a loss based on the observation loss (Lobs) to the neural network model,
wherein step (c) comprises:
(c-1) generating a common observation map common to the plurality of classes on the basis of the class-specific observation maps; and
(c-2) calculating the observation loss (Lobs) by using the common observation map and a target region of the input image, and
wherein each step is performed by a computer processor
wherein the observation loss is calculated by calculating a cosine distance for concatenated values obtained by respectively projecting the common observation map and the target region of the input image in horizontal and vertical directions.