| CPC H04N 9/8715 (2013.01) [G06F 3/165 (2013.01); G06F 16/489 (2019.01); G06F 40/58 (2020.01); G06V 20/40 (2022.01); G06V 30/19 (2022.01); G10L 17/06 (2013.01); G10L 25/57 (2013.01)] | 17 Claims |

|
1. An interactive information processing method, comprising:
establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream;
presenting the multimedia data stream and the display text based on the correspondence; and
in response to detecting a triggering operation triggering a first display content in the display text, adjusting, based on a timestamp corresponding to the first display content and the correspondence, the multimedia data stream to navigate to a playback position corresponding to the first display content;
wherein the first display content comprises a text corresponding to speech in the multimedia data stream; and
wherein the display text and the multimedia data stream are displayed on different display areas of a page respectively, and a display area occupied by the display text is not superimposed on a display area occupied by the multimedia data stream,
wherein the interactive information processing method further comprises:
acquiring an audio-video frame of the multimedia data stream, and determining a user identity of a speaking user corresponding to the audio-video frame;
generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame;
acquiring a search content edited in a search content editing control, and acquiring a target content corresponding to the search content from the display text, each target content is the same as the search content;
displaying the target content differentially in the display text, and marking the audio- video frame corresponding to the target content in a controlling control corresponding to the multimedia data stream; and
displaying the display text and the multimedia data stream on a target page, and
wherein displaying the display text and the multimedia data stream on the target page comprises:
displaying a first display text and a third display text in the display text and a recording screen video in preset display regions on the target page, respectively, wherein content displayed in the first display text are characters generated based on an audio frame comprised in the audio-video frame, the third display text comprises at least one keyword or at least one key sentence, determining a content corresponding to the target content from the first display text in response to detecting that the target content in the third display text is triggered, and displaying the content differentially.
|