US 11,954,898 B1
	Learning method and learning device for performing transfer learning on an object detector that has been trained to detect first object classes such that the object detector is able to detect second object classes, and testing method and testing device using the same
Kye Hyeon Kim, Suwon-si (KR)
Assigned to SUPERB AI CO., LTD., Seoul (KR)
Filed by Superb AI Co., Ltd., Seoul (KR)
Filed on Oct. 27, 2023, as Appl. No. 18/384,664.
Claims priority of application No. 10-2022-0149465 (KR), filed on Nov. 10, 2022.
Int. Cl. G06V 10/764 (2022.01); G06V 10/22 (2022.01); G06V 10/42 (2022.01); G06V 10/766 (2022.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/764 (2022.01) [G06V 10/22 (2022.01); G06V 10/421 (2022.01); G06V 10/766 (2022.01); G06V 10/7715 (2022.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)]

16 Claims

1. A learning method for performing transfer learning on an object detector that has been trained to detect first object classes such that the object detector is able to detect second object classes, comprising steps of:

(a) on condition that (i) first training images including one or more first objects corresponding to the first object classes have been acquired from a first training data set, (ii) first feature maps have been outputted by applying at least one convolution operation to each of the first training images through at least one convolutional layer, (iii) first ROI proposals have been outputted, wherein the first ROI proposals have been acquired by predicting object regions in each of the first feature maps corresponding to each of the first training images through a first ROI (region of interest) proposal network, (iv) first pooled feature maps have been outputted, wherein the first pooled feature maps have been acquired by pooling each of regions corresponding to the first ROI proposals in each of the first feature maps through a pooling layer, (v) first FC outputs have been generated by applying first FC operation to the first pooled feature maps through a first FC (fully-connected) layer, (vi) pieces of first class prediction information and pieces of first regression prediction information corresponding to objects of the first training images have been outputted by applying second FC operation to the first FC outputs through a second FC layer, (vii) first class losses and first regression losses have been acquired by referring to pieces of first class GT (ground truth) information and pieces of first regression GT information corresponding to each of pieces of the first class prediction information and pieces of the first regression prediction information, (viii) the first class losses and the first regression losses have been backpropagated and thus first parameters of the convolutional layer, second parameters of the first ROI proposal network, third parameters of the first FC layer and fourth parameters of the second FC layer have been trained, in response to acquiring second training images including at least one of second objects corresponding to the second object classes from a second training data set, a learning device instructing the convolutional layer having the first parameters trained in advance to output second feature maps by applying the convolution operation to each of the second training images;

(b) the learning device (i) instructing each of the first ROI proposal network having the second parameters trained in advance and a second ROI proposal network having fifth parameters that have not been trained to perform a process of predicting object regions in each of the second training images by referring to each of the second feature maps, thereby outputting each of (2_1)-st ROI proposals and (2_2)-nd ROI proposals, and (ii) instructing the pooling layer to pool regions corresponding to each of the (2_1)-st ROI proposals and the (2_2)-nd proposals in each of the second feature maps, thereby outputting second pooled feature maps;

(c) the learning device (i) instructing the first FC layer having the third parameters trained in advance to generate second FC outputs by applying the first FC operation to the second pooled feature maps, and (ii) instructing the second FC layer having the fourth parameters that have not been trained to apply the second FC operation to the second FC outputs, thereby outputting pieces of second class prediction information and pieces of second regression prediction information corresponding to objects on the second training images; and

(d) the learning device (i) acquiring second class losses and second regression losses by referring to pieces of the second class prediction information, pieces of the second regression prediction information and pieces of second class GT information and pieces of second regression GT information, respectively corresponding to pieces of the second class prediction information and pieces of the second regression prediction information, and (ii) backpropagating the second class losses and the second regression losses, thereby further training the fifth parameters of the second ROI proposal network that have not been trained, the third parameters of the first FC layer that have been trained in advance and the fourth parameters of the second FC layer that have not been trained.