US 12,266,150 B2
	Automatically generating training data sets for object recognition
Dehua Cui, Redmond, WA (US); Albert Thambiratnam, Redmond, WA (US); Ming Zhong, Redmond, WA (US); and Wenhui Zhang, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Appl. No. 17/292,882
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
PCT Filed Dec. 12, 2018, PCT No. PCT/CN2018/120733 § 371(c)(1), (2) Date May 11, 2021, PCT Pub. No. WO2020/118584, PCT Pub. Date Jun. 18, 2020.
Prior Publication US 2021/0406595 A1, Dec. 30, 2021
Int. Cl. G06V 10/764 (2022.01); G06F 18/20 (2023.01); G06F 18/214 (2023.01); G06F 18/23 (2023.01); G06F 18/25 (2023.01); G06N 5/02 (2023.01); G06V 10/82 (2022.01); G06V 20/62 (2022.01); G06V 40/16 (2022.01)

CPC G06V 10/764 (2022.01) [G06F 18/214 (2023.01); G06F 18/23 (2023.01); G06F 18/251 (2023.01); G06F 18/29 (2023.01); G06N 5/02 (2013.01); G06V 10/82 (2022.01); G06V 20/62 (2022.01); G06V 40/16 (2022.01)]

17 Claims

1. A method for automatically generating a training data set for object recognition, comprising:

obtaining profile information for a plurality of objects; and

for each object from the plurality of objects:

collecting a group of initial images associated with the object based on an identity information of the object included in the profile information of the object;

filtering the group of initial images to obtain a group of filtered images associated with the object, wherein filtering the group of initial images further comprises, for each initial image:

calculating a first relevance score based on a similarity between the initial image and an image in the profile information of the object;

calculating a second relevance score based on a similarity between a description of the initial image and a description of the image in the profile information of the object;

determining that the initial image is a noisy image based on the first relevance score and the second relevance score; and

removing the initial image from the group of initial images in response to the determining that the initial image is a noisy image;

generating a group of training data pairs corresponding to the object by labeling each of the group of filtered images with the identity information of the object;

adding the group of training data pairs into the training data set; and

training an image recognition model based on the training data set, wherein the trained image recognition model is configured to perform image recognition for an input image.