CPC G06T 7/215 (2017.01) [G06T 7/248 (2017.01); G06T 7/74 (2017.01); G06V 10/764 (2022.01); G06V 20/40 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 20 Claims |
1. A method performed by one or more data processing apparatus, the method comprising:
processing a video comprising a plurality of video frames to generate, for each video frame of the plurality of video frames, a corresponding optical flow image characterizing a displacement of each pixel of the video frame between the video frame and a subsequent video frame in the video;
for each optical flow image, processing the optical flow image using an optical flow object localization system to generate object localization data defining locations of objects depicted in the video frame corresponding to the optical flow image; and
using: (i) the plurality of video frames, and (ii) the object localization data generated by the optical flow object localization system by processing the optical flow images corresponding to the plurality of video frames, training a visual object localization system to process a video frame to generate object localization data defining locations of objects depicted in the video frame, wherein training the visual object localization system comprises, for one or more of the plurality of video frames:
determining target object localization data for the video frame based on the object localization data generated by processing the optical flow image corresponding to the video frame using the optical flow object localization system; and
training the visual object localization system to process the video frame to generate object localization data for the video frame that matches the target object localization data for the video frame.
|