US 12,033,378 B2
	Resolving training dataset category ambiguity
Or Shabtay, Tel-Aviv (IL); Eran Shlomo, Zichron Yaakov (IL); and Avi Yashar, Natanya (IL)
Assigned to DATALOOP LTD., Herzliya (IL)
Filed by DATALOOP LTD., Herzliya (IL)
Filed on Apr. 12, 2021, as Appl. No. 17/228,217.
Claims priority of provisional application 63/009,171, filed on Apr. 13, 2020.
Prior Publication US 2021/0319264 A1, Oct. 14, 2021
Int. Cl. G06V 10/94 (2022.01); G06F 18/21 (2023.01); G06F 18/213 (2023.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06N 20/00 (2019.01); G06V 10/24 (2022.01); G06V 10/30 (2022.01); G06V 10/32 (2022.01)

CPC G06V 10/945 (2022.01) [G06F 18/213 (2023.01); G06F 18/2148 (2023.01); G06F 18/2185 (2023.01); G06F 18/2431 (2023.01); G06N 20/00 (2019.01); G06V 10/247 (2022.01); G06V 10/30 (2022.01); G06V 10/32 (2022.01)]

17 Claims

1. A system comprising:

at least one hardware processor; and

a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to:

receive, as input, a training dataset for training a machine learning model, comprising:

(i) a plurality of images, and

(ii) a set of classes associated with one or more objects in each of said images, select at least one image from said dataset,

apply one or more transformations to said selected image, to create a set of transformed images, wherein each of said transformed images includes a representation of at least one of said one or more objects,

receive annotations with respect to at least some of said objects in said selected image and at least some of said transformed images, wherein said annotations comprise assigning each of said one or more objects to one of said classes,

calculate an ambiguity score with respect to at least one pair of classes in said set of classes, based, at least in part, on a number of times said annotations assigned one of said objects to both of said classes in said pair,

optimize said set of classes to generate an optimized set of classes, based, at least in part on all of said calculated ambiguity scores, and

construct a modified version of said training dataset comprising:

(iii) said plurality of images, and

(iv) said optimized set of classes.