US 12,455,945 B2
Device and in particular a computer-implemented method for classifying data sets
Lukas Lange, Pforzheim (DE); and Jannik Stroetgen, Karlsruhe (DE)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Mar. 11, 2022, as Appl. No. 17/654,417.
Claims priority of application No. 10 2021 202 564.1 (DE), filed on Mar. 16, 2021.
Prior Publication US 2022/0300750 A1, Sep. 22, 2022
Int. Cl. G06F 18/00 (2023.01); G06F 18/214 (2023.01); G06F 18/2413 (2023.01); G06F 18/2431 (2023.01)
CPC G06F 18/2413 (2023.01) [G06F 18/214 (2023.01); G06F 18/2431 (2023.01)] 7 Claims
OG exemplary drawing
 
1. A computer-implemented method for classifying data sets, the method comprising:
predefining a model for solving a task;
predefining a plurality of training data sets;
defining, for each training data set from the plurality of training data sets, a respective trained model for solving the task by pretraining the model on the training data set and training the model on a reference training data set;
determining a trained reference model for solving the task by training the model on the reference training data set without pretraining with the plurality of training data sets;
determining, for each respective trained model, a respective quality of solving the task;
determining, for the trained reference model, a reference quality of solving the task;
classifying each respective trained model as suitable or unsuitable for the pretraining as a function of a deviation of the respective quality from the reference quality;
determining, in the plurality of training data sets, nearest neighbors of a data set of the plurality of training data sets; and
either classifying the data set as suitable or unsuitable for the pretraining as a function of how the trained models, which have been trained with the nearest neighbors, are classified, or classifying the nearest neighbors of the data set as suitable for the pretraining, wherein for training of the model, the model is pretrained with at least one of the training data sets, which is classified as suitable for the pretraining, and the model for solving the task is subsequently trained with the data set.