US 11,669,977 B2
	Processing images to localize novel objects
Susanna Maria Ricco, Redwood City, CA (US); and Bryan Andrew Seybold, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 26, 2021, as Appl. No. 17/214,327.
Application 17/214,327 is a continuation of application No. 16/264,222, filed on Jan. 31, 2019, granted, now 10,991,122.
Claims priority of provisional application 62/760,594, filed on Nov. 13, 2018.
Prior Publication US 2021/0217197 A1, Jul. 15, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 7/73 (2017.01); G06T 7/215 (2017.01); G06T 7/246 (2017.01); G06V 10/764 (2022.01); G06V 20/40 (2022.01)

CPC G06T 7/215 (2017.01) [G06T 7/248 (2017.01); G06T 7/74 (2017.01); G06V 10/764 (2022.01); G06V 20/40 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

20 Claims

1. A method performed by one or more data processing apparatus, the method comprising:

processing a video comprising a plurality of video frames to generate, for each video frame of the plurality of video frames, a corresponding optical flow image characterizing a displacement of each pixel of the video frame between the video frame and a subsequent video frame in the video;

for each optical flow image, processing the optical flow image using an optical flow object localization system to generate object localization data defining locations of objects depicted in the video frame corresponding to the optical flow image; and

using: (i) the plurality of video frames, and (ii) the object localization data generated by the optical flow object localization system by processing the optical flow images corresponding to the plurality of video frames, training a visual object localization system to process a video frame to generate object localization data defining locations of objects depicted in the video frame, wherein training the visual object localization system comprises, for one or more of the plurality of video frames:

determining target object localization data for the video frame based on the object localization data generated by processing the optical flow image corresponding to the video frame using the optical flow object localization system; and

training the visual object localization system to process the video frame to generate object localization data for the video frame that matches the target object localization data for the video frame.