| CPC G06F 40/279 (2020.01) [G06F 40/205 (2020.01); G06F 40/268 (2020.01)] | 9 Claims |

|
1. A subtitle generating apparatus comprising:
processing circuitry configured to:
sequentially acquire texts from voice recognition results,
store the texts as archive datasets,
estimate a split position and a concatenation position of the texts from one or more of the archive datasets, and generate a subtitle text from said one or more of the archive datasets based on the split position and the concatenation position, and
update the archive datasets based on the split position and the concatenation position; and
a display configured to display the subtitle text,
wherein:
the processing circuitry is configured to:
continue estimation of the concatenation position so as to concatenate the texts in the archive datasets, until a number of characters of an obtained concatenated text reaches a threshold character number or greater, and
subject the concatenated text to one or both of a morphological analysis and a dependency parsing, generate a feature vector of each word based on a result of the analysis and the parsing, calculate a split estimation score for each word using the feature vector, and estimate the split position based on the split estimation score,
the split estimation score for said each word is calculated using an LSTM model adapted to take into account a context before the word, a context after the word, or a context before and after the word, and
the feature vector includes a word index assigned to said each word from the beginning of a sentence, a character length, a word class, and a result of dependency parsing.
|