US 12,367,656 B2
Method and system for semi-supervised state transition detection for object tracking
Matthew A. Shreve, Campbell, CA (US); Robert R. Price, Palo Alto, CA (US); Jeyasri Subramanian, Sunnyvale, CA (US); and Sumeet Menon, Baltimore, MD (US)
Assigned to Xerox Corporation, Norwalk, CT (US)
Filed by Palo Alto Research Center Incorporated, Palo Alto, CA (US)
Filed on Sep. 8, 2022, as Appl. No. 17/940,884.
Prior Publication US 2024/0087287 A1, Mar. 14, 2024
Int. Cl. G06V 10/764 (2022.01); G06V 10/26 (2022.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01)
CPC G06V 10/764 (2022.01) [G06V 10/26 (2022.01); G06V 10/7753 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving an input video and a first annotated image from the input video, wherein the first annotated image identifies an object of interest in the input video;
initiating a tracker based on the first annotated image and the input video beginning from a start of the input video;
generating, by the tracker based on the first annotated image and the input video, information including: a sliding window for false positives; a first set of unlabeled images from the input video; and at least two images in which the object of interest is labeled with its corresponding state;
classifying, by a semi-supervised classifier based on the information, the first set of unlabeled images from the input video;
responsive to determining that a first unlabeled image is classified as a false positive, reinitiating the tracker based on a second annotated image and the input video beginning from a frame with the second annotated image, wherein the frame with the second annotated image occurs in the input video prior to a frame with the first unlabeled image classified as a false positive; and
generating an output video comprising the input video displayed with tracking on the object of interest, wherein the object of interest in each image from the input video is annotated and labeled with its corresponding state.