US 12,190,588 B2
	Occlusion-aware multi-object tracking
Dongdong Chen, Bellevue, WA (US); Qiankun Liu, Hefei (CN); Lu Yuan, Redmond, WA (US); and Lei Zhang, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 4, 2021, as Appl. No. 17/339,413.
Prior Publication US 2022/0391621 A1, Dec. 8, 2022
Int. Cl. G06V 20/52 (2022.01); G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06T 7/20 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01)

CPC G06V 20/52 (2022.01) [G06F 18/2155 (2023.01); G06N 3/045 (2023.01); G06T 7/20 (2013.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06T 2207/30241 (2013.01)]

11 Claims

1. A system for tracking a target object across a plurality of image frames, comprising:

a logic machine; and

a storage machine holding instructions executable by the logic machine to:

calculate a trajectory for the target object over one or more previous frames occurring before a target frame, wherein the target object is detected by tracking, in a similarity matrix, comparison values indicating similarity between object feature data for a first set of objects detected in a first previous frame, the first set of objects including the target object, and object feature data for a second set of objects in a second previous frame, wherein the similarity matrix includes:

a row for each object in a union of both of the first set of objects and the second set of objects; and

a column for each object in the union,

wherein each matrix element of the similarity matrix represents one comparison value between a pair of objects drawn from the union;

responsive to assessing no detection of the target object in the target frame:

upon determining that the target object is not detected in the target frame due to being occluded by a set of one or more other objects, predict an estimated region for the target object based on the trajectory;

predict an occlusion center based on a set of candidate occluding locations for the set of other objects within a threshold distance of the estimated region, each location of the set of candidate occluding locations overlapping with the estimated region; and

automatically estimate a bounding box for the target object in the target frame based on the occlusion center, wherein the bounding box is estimated via a trained machine learning system trained via supervised learning with image data and ground-truth bounding boxes.