US 12,469,290 B2
	Class agnostic repetition counting in video(s) utilizing a temporal self-similarity matrix
Debidatta Dwibedi, Santa Clara, CA (US); Yusuf Aytar, London (GB); Jonathan Tompson, San Carlos, CA (US); Andrew Zisserman, Oxford (GB); and Pierre Sermanet, Palo Alto, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 18/008,204
Filed by GOOGLE LLC, Mountain View, CA (US)
PCT Filed Jun. 10, 2020, PCT No. PCT/US2020/036942 § 371(c)(1), (2) Date Dec. 5, 2022, PCT Pub. No. WO2021/251959, PCT Pub. Date Dec. 16, 2021.
Prior Publication US 2023/0274548 A1, Aug. 31, 2023
Int. Cl. G06V 20/40 (2022.01); G06V 10/74 (2022.01); G06V 10/771 (2022.01); G06V 10/82 (2022.01)

CPC G06V 20/48 (2022.01) [G06V 10/761 (2022.01); G06V 10/771 (2022.01); G06V 10/82 (2022.01)]

23 Claims

1. A method implemented by one or more processors, the method comprising:

processing a sequence of video frames capturing a periodic activity, using an encoder portion of a repetition network, to generate a sequence of encoded video frames, wherein processing the sequence of video frames capturing the periodic activity, using the encoder portion of the repetition network, to generate the sequence of encoded video frames comprises:

for each video frame in the sequence of video frames:

processing the video frame using a first portion of the encoder to generate two dimensional features of the video frame;

processing the two dimensional features, of the video frame, using a second portion of the encoder, to generate temporal context features for the video frame, and

processing the temporal context features, of the video frame, using a third portion of the encoder, to generate a corresponding encoded video frame of the sequence of encoded video frames;

generating, based on the sequence of encoded video frames, a temporal self-similarity matrix indicating a pairwise similarity between encoded video frames in the sequence of encoded video frames; and

processing the temporal self-similarity matrix using a period predictor model portion of the repetition network, to generate (a) a period length of the periodic activity in the sequence of video frames and/or (b) a per frame periodicity classification of the sequence of video frames.