| CPC G06V 20/46 (2022.01) [G06F 40/20 (2020.01); G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2013.01); G06V 20/41 (2022.01); G06V 20/48 (2022.01); G06V 20/49 (2022.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); G10L 25/57 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20021 (2013.01)] | 20 Claims |

|
1. A method comprising:
obtaining an input video;
extracting video frames and audio data from the input video;
processing the video frames to determine a target video frame, by comparing the video frames with a pre-stored object image to determine one or more similarities, determining a target object in the video frames based on the one or more similarities, and performing a selection among the video frames based on object information of the target object to determine the target video frame;
processing the audio data to obtain text information;
determining, based on a corresponding time of the target video frame in the input video and a corresponding time of the text information in the input video, target text information corresponding to the target video frame; and
processing the target video frame and the target text information to generate graphic and text information.
|