US 12,431,142 B2
Apparatus and method for improving context-based automatic interpretation performance
Seung Yun, Daejeon (KR); Jeonguk Bang, Daejeon (KR); Min Kyu Lee, Daejeon (KR); and Sanghun Kim, Daejeon (KR)
Assigned to Electronics and Telecommunications Research Institute, Daejeon (KR)
Filed by ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, Daejeon (KR)
Filed on Dec. 21, 2022, as Appl. No. 18/085,889.
Claims priority of application No. 10-2022-0002266 (KR), filed on Jan. 6, 2022.
Prior Publication US 2023/0290360 A1, Sep. 14, 2023
Int. Cl. G10L 19/00 (2013.01); G10L 25/78 (2013.01)
CPC G10L 19/00 (2013.01) [G10L 25/78 (2013.01)] 6 Claims
OG exemplary drawing
 
1. An apparatus for improving context-based automatic interpretation performance, comprising: an uttered voice input unit configured to receive a voice signal from a user; a previous sentence input unit configured to transmit a user's previous utterance when the voice signal is input by the uttered voice input unit; a voice encoding processing unit configured to decode only the voice signal through the uttered voice input unit and transmit a vector; a context encoding processing unit configured to extract a context vector from a previous utterance transmitted through the previous sentence input unit when there is the previous utterance and transmit the extracted context vector of the previous utterance; and an interpretation decoding processing unit configured to output an interpretation result text by performing decoding on interpretation using vector attention information of an extracted voice signal and attention information of the extracted context vector of the previous utterance when it is determined by the previous sentence input unit that there is the user's previous utterance based on a multilingual embedding language model, when it is determined by the previous sentence input unit that there is no user's previous utterance, the interpretation decoding processing unit performing decoding using only the voice signal decoded through the voice encoding processing unit, wherein the multilingual embedding language model includes: a learning information collection unit configured to collect a sufficient amount of interpretation target language multilingual text to train the multilingual embedding language model; a preprocessing unit configured to delete an inappropriate symbol from the collected sufficient amount of interpretation target language multilingual text or convert the inappropriate symbol into preprocessed text; a tokenizing processing unit configured to tokenize the preprocessed text into a preset form; an embedding processing unit configured to embed the tokenized preprocessed text to correspond to multilingual text; and a multilingual embedding language model learning unit configured to train an embedded multilingual text.