US 12,277,766 B2
Information generation method and apparatus
Yi Meng, Hangzhou (CN); and Yi Xu, Hangzhou (CN)
Assigned to ALIBABA (CHINA) CO., LTD., Hangzhou (CN)
Filed by Alibaba (China) Co., Ltd., Binjiang District Hangzhou (CN)
Filed on May 13, 2022, as Appl. No. 17/743,496.
Claims priority of application No. 202110554169.1 (CN), filed on May 20, 2021.
Prior Publication US 2022/0375223 A1, Nov. 24, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 40/20 (2020.01); G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2006.01); G10L 15/22 (2006.01); G10L 15/26 (2006.01); G10L 25/57 (2013.01)
CPC G06V 20/46 (2022.01) [G06F 40/20 (2020.01); G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2013.01); G06V 20/41 (2022.01); G06V 20/48 (2022.01); G06V 20/49 (2022.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); G10L 25/57 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20021 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining an input video;
extracting video frames and audio data from the input video;
processing the video frames to determine a target video frame, by comparing the video frames with a pre-stored object image to determine one or more similarities, determining a target object in the video frames based on the one or more similarities, and performing a selection among the video frames based on object information of the target object to determine the target video frame;
processing the audio data to obtain text information;
determining, based on a corresponding time of the target video frame in the input video and a corresponding time of the text information in the input video, target text information corresponding to the target video frame; and
processing the target video frame and the target text information to generate graphic and text information.