US 12,431,150 B2
	Method and apparatus for reconstructing voice conversation
Myeongjin Hwang, Seoul (KR); Suntae Kim, Seoul (KR); and Changjin Ji, Seoul (KR)
Assigned to LLSOLLU CO., LTD., Seoul (KR)
Filed by LLSOLLU CO., LTD., Seoul (KR)
Filed on Mar. 7, 2023, as Appl. No. 18/118,682.
Application 18/118,682 is a continuation in part of application No. 17/198,046, filed on Mar. 10, 2021, abandoned.
Claims priority of application No. 10-2020-0029826 (KR), filed on Mar. 10, 2020.
Prior Publication US 2023/0223032 A1, Jul. 13, 2023
Int. Cl. G10L 25/87 (2013.01); G10L 15/22 (2006.01); G10L 19/022 (2013.01)

CPC G10L 19/022 (2013.01) [G10L 15/22 (2013.01); G10L 25/87 (2013.01)]

8 Claims

1. A voice conversation reconstruction method performed by a voice conversation reconstruction apparatus, the method comprising:

acquiring a plurality of speaker-specific voice recognition data corresponding to a plurality of speakers about voice conversation;

dividing each of the plurality of the speaker-specific voice recognition data into a plurality of blocks using a boundary between tokens such that each of the divided plurality of the speaker-specific voice recognition data includes voice data only by a single speaker, wherein the divided plurality of the speaker-specific voice recognition data are not in chronological order;

arranging the plurality of blocks of all the speaker-specific voice recognition data in chronological order without distinction of speaker;

among the arranged plurality of blocks, merging blocks when the blocks are neighbor and the speaker of the blocks are the same such that the speaker-specific voice recognition data in each of the merged blocks are in chronological order and include voice data only by the same speaker; and

reconstructing the plurality of blocks subjected to the merging in a conversation format in chronological order and based on a speaker step by step such that the speaker-specific voice recognition data in each of the reconstructed blocks are in chronological order and include voice data only by the same speaker,

wherein the steps are performed in order,

wherein acquiring the plurality of speaker-specific voice recognition data includes:

acquiring a first speaker-specific recognition result generated on an End Point Detection (EPD), and a second speaker-specific recognition result generated every preset time, and

collecting the first speaker-specific recognition result and the second speaker-specific recognition result without overlap and redundance therebetween to generate the speaker-specific voice recognition data, and

wherein the second speaker-specific recognition result is generated after a last EPD at which the first speaker-specific recognition result is generated occurs.