CPC G10L 15/26 (2013.01) | 13 Claims |
1. A method for processing a video file, said video file comprising audio content and visual content, the visual content comprising text content, wherein the method comprises:
extracting, by a processing circuit comprising a processor and a memory, the text content in the visual content;
generating, by the processing circuit, a context information for the audio content based on the text content extracted from said visual content;
converting, by the processing circuit, the audio content into text by using the context information generated based on the text content extracted from the visual content of the video file;
generating, by the processing circuit, an additional context information for the audio content based on the text obtained by converting the audio content;
combining, by the processing circuit, the context information generated based on the text content extracted from the visual content with the additional context information in order to obtain a combined context information; and
re-converting, by the processing circuit, the audio content into text by using the combined context information.
|