| CPC G06V 20/48 (2022.01) [G06V 10/761 (2022.01); G06V 10/771 (2022.01); G06V 10/82 (2022.01)] | 23 Claims |

|
1. A method implemented by one or more processors, the method comprising:
processing a sequence of video frames capturing a periodic activity, using an encoder portion of a repetition network, to generate a sequence of encoded video frames, wherein processing the sequence of video frames capturing the periodic activity, using the encoder portion of the repetition network, to generate the sequence of encoded video frames comprises:
for each video frame in the sequence of video frames:
processing the video frame using a first portion of the encoder to generate two dimensional features of the video frame;
processing the two dimensional features, of the video frame, using a second portion of the encoder, to generate temporal context features for the video frame, and
processing the temporal context features, of the video frame, using a third portion of the encoder, to generate a corresponding encoded video frame of the sequence of encoded video frames;
generating, based on the sequence of encoded video frames, a temporal self-similarity matrix indicating a pairwise similarity between encoded video frames in the sequence of encoded video frames; and
processing the temporal self-similarity matrix using a period predictor model portion of the repetition network, to generate (a) a period length of the periodic activity in the sequence of video frames and/or (b) a per frame periodicity classification of the sequence of video frames.
|