US 12,216,997 B2
	Subtitle generating apparatus, subtitle generating method, and storage medium
Taira Ashikawa, Kawasaki Kanagawa (JP); and Kenji Iwata, Machida Tokyo (JP)
Assigned to KABUSHIKI KAISHA TOSHIBA, Tokyo (JP)
Filed by KABUSHIKI KAISHA TOSHIBA, Tokyo (JP)
Filed on Aug. 29, 2022, as Appl. No. 17/898,000.
Claims priority of application No. 2021-178498 (JP), filed on Nov. 1, 2021.
Prior Publication US 2023/0132418 A1, May 4, 2023
Int. Cl. G06F 40/279 (2020.01); G06F 40/205 (2020.01); G06F 40/268 (2020.01)

CPC G06F 40/279 (2020.01) [G06F 40/205 (2020.01); G06F 40/268 (2020.01)]

9 Claims

1. A subtitle generating apparatus comprising:

processing circuitry configured to:

sequentially acquire texts from voice recognition results,

store the texts as archive datasets,

estimate a split position and a concatenation position of the texts from one or more of the archive datasets, and generate a subtitle text from said one or more of the archive datasets based on the split position and the concatenation position, and

update the archive datasets based on the split position and the concatenation position; and

a display configured to display the subtitle text,

wherein:

the processing circuitry is configured to:

continue estimation of the concatenation position so as to concatenate the texts in the archive datasets, until a number of characters of an obtained concatenated text reaches a threshold character number or greater, and

subject the concatenated text to one or both of a morphological analysis and a dependency parsing, generate a feature vector of each word based on a result of the analysis and the parsing, calculate a split estimation score for each word using the feature vector, and estimate the split position based on the split estimation score,

the split estimation score for said each word is calculated using an LSTM model adapted to take into account a context before the word, a context after the word, or a context before and after the word, and

the feature vector includes a word index assigned to said each word from the beginning of a sentence, a character length, a word class, and a result of dependency parsing.