CPC G06V 20/41 (2022.01) [G06F 18/22 (2023.01); G06F 18/29 (2023.01); G06T 7/20 (2013.01); G06T 7/73 (2017.01); G06T 2207/10016 (2013.01); G06T 2207/10024 (2013.01)] | 17 Claims |
1. A method for determining the locations and types of objects in a plurality of videos, comprising:
using a computer processor, receiving a plurality of videos;
pairing each of the videos with one or more sentences;
using the processor, determining the locations and types of the objects in the plurality of videos by:
a. using one or more object proposal mechanisms to propose locations for possible objects in one or more frames of the videos;
b. using one or more object trackers to track the positions of the proposed object locations forward or backward in time;
c. collecting the tracked proposal positions for each proposal into a tube, wherein a tube is a collection of tracked proposal positions; and
d. computing features for tracked proposal positions of a plurality of tubes,
wherein no use is made of a pretrained object detector.
|