US 12,073,322 B2
	Computer-implemented training method, classification method and system and computer-readable recording medium
Wim Abbeloos, Hove (BE); Gabriel Othmezouri, Ixelles (BE); Wouter Van Gansbeke, Dilbeek (BE); Simon Vandenhende, Leuven (BE); Marc Proesmans, Lede (BE); Stamatios Georgoulis, Zurich (CH); and Luc Van Gool, Zurich (CH)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota Aichi-ken (JP); and KATHOLIEKE UNIVERSITEIT LEUVEN, Leuven (BE)
Filed by Toyota Jidosha Kabushiki Kaisha, Toyota Aichi-ken (JP); and Katholieke Universiteit Leuven, Leuven (BE)
Filed on May 21, 2021, as Appl. No. 17/327,030.
Claims priority of application No. 20176143 (EP), filed on May 22, 2020.
Prior Publication US 2021/0365735 A1, Nov. 25, 2021
Int. Cl. G06N 3/08 (2023.01); G06F 18/10 (2023.01); G06F 18/214 (2023.01); G06F 18/23 (2023.01); G06F 18/2413 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 18/10 (2023.01); G06F 18/2148 (2023.01); G06F 18/23 (2023.01); G06F 18/24137 (2023.01)]

20 Claims

1. A computer-implemented training method for training a classifier (Φη), wherein:

a transformed sample being a sample obtained by applying a transformation (T) to a source sample, where the source sample is a datum of a source dataset (SD);

the training method comprises:

S10) training a pretext model (ΦΘ) to learn a pretext task, based on a source dataset (SD), by using a first training criterion which tends to minimize, across the source samples of the source dataset, a distance between an output of a source sample via the pretext model (ΦΘ) and an output of a corresponding transformed sample via the pretext model (ΦΘ);

S20) for at least one sample among the samples (Xi) of the source dataset (SD), determining a neighborhood (NXi) of the at least one sample;

wherein for the at least one sample, the neighborhood (NXi) of the at least one sample comprises K closest neighbors of the sample, K being an integer, K>=1, the K closest neighbors of the sample being K samples Xj of the dataset having smallest distances between ΦΘ(Xi) and ΦΘ(Xj);

S30) training the classifier Φη to predict respective estimated probabilities Φηj(Xi), j=1 . . . C, for a sample to belong to respective clusters (Cj), by using a second training criterion which:

tends to maximize a likelihood for a sample and a neighbor (Xj) of the sample belonging to the neighborhood (NXi) of the sample to belong to the same cluster; and

tends to force the samples to be distributed over a plurality of clusters;

the second training criterion includes a summation:

where

f is an increasing continuous function, for instance a logarithm;

<, > is a dot product;

D is a dataset used for training the classifier at step S30; and

|D| is the number of samples in the dataset.