US 12,008,811 B2
Machine learning-based selection of a representative video frame within a messaging application
Kavya Venkata Kota Kopparapu, Herndon, VA (US); Benjamin Dodson, Dover, NH (US); Francesc Xavier Drudis Rius, Bellevue, WA (US); Angus Kong, Seattle, WA (US); Richard Leider, San Francisco, CA (US); Jian Ren, Marina Del Ray, CA (US); Sergey Tulyakov, Marina del Rey, CA (US); and Jiayao Yu, Venice, CA (US)
Assigned to SNAP INC., Santa Monica, CA (US)
Filed by Snap Inc., Santa Monica, CA (US)
Filed on Dec. 14, 2021, as Appl. No. 17/550,852.
Claims priority of provisional application 63/131,839, filed on Dec. 30, 2020.
Prior Publication US 2022/0207875 A1, Jun. 30, 2022
Int. Cl. G06K 9/00 (2022.01); G06F 16/783 (2019.01); G06N 20/00 (2019.01); G06T 5/70 (2024.01); G06V 10/70 (2022.01); G06V 20/40 (2022.01); G06V 20/70 (2022.01)
CPC G06V 20/46 (2022.01) [G06F 16/785 (2019.01); G06N 20/00 (2019.01); G06T 5/70 (2024.01); G06V 10/70 (2022.01); G06V 20/70 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving a set of video frames corresponding to a video;
determining a first subset of video frames by removing, from the set of video frames, those video frames which are outside of an image quality threshold;
determining a second subset of video frames by removing, from the first subset of video frames, those video frames which are outside of an image stillness threshold;
computing feature data for each video frame in the second subset of video frames;
providing, for each video frame in the second subset of video frames, the feature data of the video frame as input to a machine learning model,
wherein the machine learning model is configured to output a score for each video frame in the second subset of video frames based on the feature data of the video frame, the machine learning model having been trained with a first set of images labeled based on image aesthetics, and further having been trained with second set of images labeled based on image quality, the first and second set of images being associated with different domains; and
selecting, from among the second subset of video frames, a video frame to represent the set of video frames based on the scores output by the machine learning model.