US 12,277,171 B2
Video retrieval techniques using video contrastive learning
Xiao Xia Mao, Shanghai (CN); Wei Jun Zheng, Shanghai (CN); Shi Hui Gui, Shanghai (CN); and Xiao Feng Ji, Shanghai (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Mar. 7, 2023, as Appl. No. 18/179,617.
Prior Publication US 2024/0303272 A1, Sep. 12, 2024
Int. Cl. G06V 20/70 (2022.01); G06F 16/78 (2019.01); G06F 40/30 (2020.01); G06V 10/74 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 30/19 (2022.01)
CPC G06F 16/78 (2019.01) [G06F 40/30 (2020.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 20/70 (2022.01); G06V 30/19093 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method of training a neural network for finding and retrieving queried videos, comprising:
obtaining two video clips from a first dataset and providing the two video clips to two video encoders for training;
providing an output of each of the two video encoders to a cosine similarity calculator;
training a multi-mentor paradigm having at least two mentors by obtaining two textual inputs from a second dataset, wherein a first mentor is provided each textual input to provide a similarity value comparison and a second mentor is provided said two textual inputs to provide a word mover distance (WMD); and
using said output from said multi-mentor paradigm and said encoders, calculate a contrastive loss used to provide contrastive learning of video features for differentiating similarity and dissimilarity of video clips.