US 12,112,769 B2
System, user terminal, and method for providing automatic interpretation service based on speaker separation
Jeong Uk Bang, Daejeon (KR); Seung Yun, Daejeon (KR); Sang Hun Kim, Daejeon (KR); Min Kyu Lee, Daejeon (KR); and Joon Gyu Maeng, Daejeon (KR)
Assigned to Electronics and Telecommunications Research Institute, Daejeon (KR)
Filed by Electronics and Telecommunications Research Institute, Daejeon (KR)
Filed on Nov. 19, 2021, as Appl. No. 17/531,316.
Claims priority of application No. 10-2021-0000912 (KR), filed on Jan. 5, 2021; and application No. 10-2021-0106300 (KR), filed on Aug. 11, 2021.
Prior Publication US 2022/0215857 A1, Jul. 7, 2022
Int. Cl. G10L 25/84 (2013.01); G06F 40/40 (2020.01); G06F 40/58 (2020.01); G10L 13/00 (2006.01); G10L 15/02 (2006.01); G10L 15/08 (2006.01); G10L 15/26 (2006.01); G10L 21/0208 (2013.01)
CPC G10L 25/84 (2013.01) [G06F 40/40 (2020.01); G06F 40/58 (2020.01); G10L 13/00 (2013.01); G10L 15/02 (2013.01); G10L 15/08 (2013.01); G10L 15/26 (2013.01); G10L 21/0208 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A method of performing automatic interpretation based on speaker separation by a user terminal, the method comprising:
receiving a first speech signal including at least one of a user speech of a user and a user surrounding speech around the user from an automatic interpretation service providing terminal;
separating the first speech signal into speaker-specific speech signals;
performing interpretation on the speaker-specific speech signals in a language selected by the user on the basis of an interpretation mode; and
providing a second speech signal generated as a result of the interpretation to at least one of a counterpart terminal and the automatic interpretation service providing terminal according to the interpretation mode,
wherein the performing of interpretation on the speaker-specific speech signals in the language selected by the user on the basis of the interpretation mode includes:
extracting situation information from the user surrounding speech;
wherein the providing of the second speech signal generated as the result of the interpretation to the at least one of the counterpart terminal and the automatic interpretation service providing terminal according to the interpretation mode includes:
classifying speech signals according to each speaker from the extracted situation information; and
providing the automatic interpretation service providing terminal with an interpretation result in which intensity information and echo information of the speech signals classified according to each speaker are reflected.