| CPC G06V 20/41 (2022.01) [G06N 20/00 (2019.01); G06V 30/194 (2022.01)] | 20 Claims |

|
1. A method for multi-view self-supervised learning, comprising:
receiving a plurality of video sequences, the video sequences comprising a plurality of image frames;
applying selected images of a first and second video sequence of the plurality of video sequences to a plurality of different encoders to derive a plurality of embeddings for different views of the selected images of the first and second video sequences, the plurality of embeddings comprising RGB embeddings, flow embeddings, and KeyPoint embeddings;
determining distances of the derived plurality of embeddings for the selected images of the first and second video sequences;
detecting inconsistencies between distances of the RGB embeddings, distances of the flow embeddings, and distances of the KeyPoint embeddings outside a threshold distance; and
predicting semantics of a future image based on the determined distances.
|