US 12,367,663 B1
Method for selecting training data to train a deep learning model and training data selecting device using the same
Kye Hyeon Kim, Seoul (KR); and Hyundong Lee, Seoul (KR)
Assigned to Superb AI Co., Ltd., Seoul (KR)
Filed by Superb AI Co., Ltd., Seoul (KR)
Filed on Nov. 7, 2024, as Appl. No. 18/940,100.
Int. Cl. G06V 10/774 (2022.01); G06V 10/26 (2022.01); G06V 10/762 (2022.01); G06V 20/70 (2022.01)
CPC G06V 10/774 (2022.01) [G06V 10/26 (2022.01); G06V 10/763 (2022.01); G06V 20/70 (2022.01)] 26 Claims
OG exemplary drawing
 
1. A method for selecting training data to be used for training a deep learning model, comprising steps of:
(a) obtaining, by a training data selecting device, one or more individual attributes each of which corresponds to each of a plurality of training data included in total training data stored in a data pool, and generating, by the training data selecting device, a bipartite graph by matching each of the plurality of training data included in the total training data with the individual attributes; and
(b) selecting, by the training data selecting device, n training data, which are matched with the individual attributes, among the total training data, by referring to the bipartite graph, wherein the n is a target number of the training data to be used for training the deep learning model and is a plural number, and wherein the training data selecting device selects the n training data to be used for training the deep learning model such that each cardinal number of each of the individual attributes matched with the n training data is within a predetermined deviation threshold;
wherein, at the step of (b), the individual attributes include a (1_1)-st individual attribute to a (1_x)-th individual attribute corresponding to a first attribute type of each of the plurality of training data and a (2_1)-st individual attribute to a (2_y)-th individual attribute corresponding to a second attribute type of each of the plurality of training data, wherein x and y are respectively integers greater than or equal to 1,
wherein the training data selecting device selects the n training data such that a cardinal number of the (1_1)-st individual attribute to the (1_x)-th individual attribute corresponding to the first attribute type and a cardinal number of the (2_1)-st individual attribute to the (2_y)-th individual attribute corresponding to the second attribute type, which are matched with the n training data, are within the predetermined deviation threshold, wherein a cardinal number of the (1_1)-st individual attribute to a cardinal number of the (1_x)-th individual attribute are within a first deviation threshold, and wherein a cardinal number of the (2_1)-st individual attribute to a cardinal number of the (2_y)-th individual attribute are within a second deviation threshold.