US 12,260,614 B2
	Neural network model training method and apparatus for complex characteristic classification and common localization
Jisoo Keum, Gyeonggi-do (KR); Sangil Oh, Seoul (KR); and Kyungnam Kim, Gyeonggi-do (KR)
Assigned to WAYCEN INC., Seoul (KR)
Appl. No. 17/777,246
Filed by WAYCEN INC., Seoul (KR)
PCT Filed Jul. 29, 2021, PCT No. PCT/KR2021/009939 § 371(c)(1), (2) Date May 16, 2022, PCT Pub. No. WO2022/025690, PCT Pub. Date Feb. 3, 2022.
Claims priority of application No. 10-2020-0095773 (KR), filed on Jul. 31, 2020.
Prior Publication US 2022/0406035 A1, Dec. 22, 2022
Int. Cl. G06V 10/764 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/764 (2022.01) [G06V 10/7747 (2022.01); G06V 10/82 (2022.01)]

9 Claims

1. A neural network model training method for complex characteristic classification and common localization of an image, wherein a neural network model comprises:

a convolution layer configured to perform a convolution operation on an input image by using a convolution filter;

a pooling layer configured to perform pooling on an output of the convolution layer; and

a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified and output values obtained by multiplying an output of the pooling layer by class-specific weights (W_fc(T_t)),

wherein different criteria distinguish each of the plurality of classes,

each of the plurality of classes is classified into a plurality of class-specific characteristics, and

the neural network model is capable of providing class-specific characteristic probabilities for the class-specific characteristics of each of the plurality of classes according to an output of each class-specific fully connected layer,

wherein the neural network model training method comprises:

(a) inputting the input image to the convolution layer;

(b) calculating class-specific observation maps for the plurality of respective classes on the basis of the output of the convolution layer;

(c) calculating an observation loss (L_obs) common to the plurality of classes on the basis of the class-specific observation maps; and

(d) back-propagating a loss based on the observation loss (L_obs) to the neural network model,

wherein step (c) comprises:

(c-1) generating a common observation map common to the plurality of classes on the basis of the class-specific observation maps; and

(c-2) calculating the observation loss (L_obs) by using the common observation map and a target region of the input image, and

wherein each step is performed by a computer processor

wherein the observation loss is calculated by calculating a cosine distance for concatenated values obtained by respectively projecting the common observation map and the target region of the input image in horizontal and vertical directions.