US 12,149,716 B2
Contrastive object representation learning from temporal data
Frank Brongers, Amsterdam (NL); Phillip Lippe, Amsterdam (NL); and Sara Magliacane, Amsterdam (NL)
Assigned to QUALCOMM Technologies, Inc., San Diego, CA (US)
Filed by QUALCOMM Technologies, Inc., San Diego, CA (US)
Filed on Feb. 25, 2022, as Appl. No. 17/681,675.
Prior Publication US 2023/0308666 A1, Sep. 28, 2023
Int. Cl. H04N 19/20 (2014.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); H04N 19/136 (2014.01); H04N 19/436 (2014.01)
CPC H04N 19/20 (2014.11) [G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); H04N 19/436 (2014.11)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving, by an artificial neural network (ANN), a video, the video comprising a temporal sequence of frames including images of one or more objects, wherein the ANN comprises a first convolutional recurrent layer and a second convolutional recurrent layer;
processing the temporal sequence of frames using the first convolutional recurrent layer and the second convolutional recurrent layer to extract a temporal shift of features between multiple frames of the temporal sequence of frames;
generating, by the ANN, object representations corresponding to the one or more objects of a current frame based on at least one previous frame and the temporal shift of features between the multiple frames of the temporal sequence of frames, the ANN being configured to generate the object representations according to a noise contrastive estimation loss, wherein the object representations describe the one or more objects in accordance with differences or similarities in the one or more objects in an environment of the current frame of the video relative to the one or more objects in the environment of a different frame of the video; and
communicating the object representations to a receiver.