| CPC G11B 27/036 (2013.01) [G06V 20/41 (2022.01); G10L 17/02 (2013.01); G10L 25/57 (2013.01); G10L 25/63 (2013.01)] | 18 Claims |

|
1. A multimedia data recording method comprising:
performing real-time analysis on multimedia data to obtain voice content and a demonstration action of a target object, the multimedia data including first audio data and image frame data that are simultaneously collected;
determining whether the demonstration action is semantically consistent with the voice content, wherein the demonstration action is semantically consistent with the voice content when a first similarity between text content corresponding to the demonstration action and text content corresponding to the voice content is greater than a threshold;
in response to the demonstration action being semantically inconsistent with the voice content, performing video understanding on an image frame corresponding to the demonstration action, to convert the demonstration action to second audio data; and
dynamically inserting the second audio data into the first audio data to update the multimedia data.
|