US 11,948,340 B2
	Detecting objects in video frames using similarity detectors
Kun Yu, Shanghai (CN); Ciyong Chen, Shanghai (CN); Xiaotian Guo, Hefei (CN); Yan Hao, Shanghai (CN); Hui Li, Shanghai (CN); Lu Li, Shanghai (CN); Jianguo Pei, Shanghai (CN); and Zhi Yong Zhu, Shanghai (CN)
Assigned to Intel Corporation, Santa Clara, CA (US)
Appl. No. 17/255,331
Filed by Intel Corporation, Santa Clara, CA (US)
PCT Filed Sep. 7, 2018, PCT No. PCT/CN2018/104651 § 371(c)(1), (2) Date Dec. 22, 2020, PCT Pub. No. WO2020/047854, PCT Pub. Date Mar. 12, 2020.
Prior Publication US 2021/0271923 A1, Sep. 2, 2021
Int. Cl. G06V 10/25 (2022.01); G06F 18/21 (2023.01); G06F 18/22 (2023.01); G06N 3/045 (2023.01); G06V 10/62 (2022.01); G06V 10/74 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/52 (2022.01)

CPC G06V 10/25 (2022.01) [G06F 18/21 (2023.01); G06F 18/22 (2023.01); G06N 3/045 (2023.01); G06V 10/62 (2022.01); G06V 10/761 (2022.01); G06V 10/82 (2022.01); G06V 20/46 (2022.01); G06V 20/48 (2022.01); G06V 20/52 (2022.01)]

24 Claims

1. An apparatus for detecting objects in video frames, the apparatus comprising:

memory;

instructions; and

processor circuitry to execute the instructions to at least:

calculate first localization information and first confidence information for first potential object patches in a first frame of a plurality of video frames;

calculate second localization information and second confidence information for second potential object patches in an adjacent frame of the first frame in the plurality of video frames;

detect paired patches between the first frame and the adjacent frame based on a comparison of the first potential object patches and the second potential object patches; and

modify a first prediction result for a first paired patch in the adjacent frame, the first prediction result including a first confidence score associated with an object type, the first prediction result to be modified to correspond to a second prediction result of a corresponding second paired patch in the first frame having a second confidence score that is higher than the first confidence score of the first prediction result of the first paired patch in the adjacent frame.