US 12,260,642 B2
	Computerized system and method for fine-grained video frame classification and content creation therefrom
Deven Santosh Shah, San Jose, CA (US); Avijit Shah, Santa Clara, CA (US); Topojoy Biswas, San Jose, CA (US); and Biren Barodia, Milpitas, CA (US)
Assigned to YAHOO AD TECH LLC, New York, NY (US)
Filed by YAHOO AD TECH LLC, Dulles, VA (US)
Filed on Dec. 23, 2021, as Appl. No. 17/560,386.
Prior Publication US 2023/0206632 A1, Jun. 29, 2023
Int. Cl. G06V 10/764 (2022.01); G06V 20/40 (2022.01); G06V 20/70 (2022.01)

CPC G06V 20/44 (2022.01) [G06V 10/764 (2022.01); G06V 20/41 (2022.01); G06V 20/70 (2022.01)]

20 Claims

1. A method comprising:

identifying, by a device, a video depicting a key event, the video having a plurality of frames;

extracting, by the device, a sequence of frames from the plurality of frames;

determining, by the device, a camera view for each frame of the sequence of frames to form a sequence of camera views by applying a pre-trained panoptic segmentation model to identify relevant objects within the frame;

applying a pre-trained pose estimation model to estimate poses for any persons within the frame;

assigning a label to the frame based on a predetermined labeling function and semantic information yielded by the panoptic segmentation model and the pose estimation model; and

determining, by the device, a type of key event from the sequence of camera views by comparing the sequence of camera views to predetermined arrangements of camera views associated with different types of key events.