US 12,131,537 B2
System and method for sentence directed video object codetection
Jeffrey Mark Siskind, West Lafayette, IN (US); and Haonan Yu, Sunnyval, CA (US)
Filed by Purdue Research Foundation, West Lafayette, IN (US)
Filed on Dec. 29, 2020, as Appl. No. 17/135,995.
Application 17/135,995 is a continuation of application No. 16/323,179, abandoned, previously published as PCT/US2017/036232, filed on Jun. 6, 2017.
Claims priority of provisional application 62/346,459, filed on Jun. 6, 2016.
Prior Publication US 2022/0207272 A1, Jun. 30, 2022
Prior Publication US 2023/0410504 A9, Dec. 21, 2023
Int. Cl. G06V 20/40 (2022.01); G06F 18/20 (2023.01); G06F 18/22 (2023.01); G06T 7/20 (2017.01); G06T 7/73 (2017.01)
CPC G06V 20/41 (2022.01) [G06F 18/22 (2023.01); G06F 18/29 (2023.01); G06T 7/20 (2013.01); G06T 7/73 (2017.01); G06T 2207/10016 (2013.01); G06T 2207/10024 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for determining the locations and types of objects in a plurality of videos, comprising:
using a computer processor, receiving a plurality of videos;
pairing each of the videos with one or more sentences;
using the processor, determining the locations and types of the objects in the plurality of videos by:
a. using one or more object proposal mechanisms to propose locations for possible objects in one or more frames of the videos;
b. using one or more object trackers to track the positions of the proposed object locations forward or backward in time;
c. collecting the tracked proposal positions for each proposal into a tube, wherein a tube is a collection of tracked proposal positions; and
d. computing features for tracked proposal positions of a plurality of tubes,
wherein no use is made of a pretrained object detector.