US 12,333,266 B2
Training method, text translation method, electronic device, and storage medium
Xiyang Wang, Beijing (CN); Ruiqing Zhang, Beijing (CN); Zhongjun He, Beijing (CN); Zhi Li, Beijing (CN); and Hua Wu, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Nov. 8, 2022, as Appl. No. 17/982,965.
Claims priority of application No. 202111358246.2 (CN), filed on Nov. 16, 2021.
Prior Publication US 2023/0076471 A1, Mar. 9, 2023
Int. Cl. G06F 40/58 (2020.01); G06F 40/279 (2020.01); G06F 40/42 (2020.01); G06F 40/51 (2020.01)
CPC G06F 40/58 (2020.01) [G06F 40/51 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method of training a text translation model, implemented by an electronic device, the method comprising:
performing a feature extraction on source sample text data of a source sample text by an encoder to obtain a sample feature vector sequence, wherein the source sample text data is obtained by masking at least one target object in original source sample text data and wherein the sample feature vector sequence comprises a sample feature vector corresponding to each target object in the at least one target object;
obtaining a target sample feature vector by fusing a plurality of sample feature vectors included in the sample feature vector sequence;
performing an autoregressive decoding and a parallel non-autoregressive decoding on the sample feature vector sequence, by an autoregressive decoder and a non-autoregressive decoder respectively, to obtain an autoregressive text translation result and a non-autoregressive text translation result;
processing the target sample feature vector by a first length predictor to obtain a first predicted length value of the source sample text and processing the target sample feature vector by a second length predictor to obtain a second predicted length value of a translation sample text, wherein the first length predictor and the second length predictor are linear classifiers and wherein the translation sample text represents a translation text corresponding to the source sample text;
training a predetermined model by using translation sample text data of the translation sample text, the autoregressive text translation result, the non-autoregressive text translation result, a true length value of the source sample text, the first predicted length value, a true length value of the translation sample text, and the second predicted length value, wherein the training comprises adjusting a model parameter of the predetermined model based on a loss function until a predetermined condition is met, wherein the predetermined condition comprises that an output value converges or a number of training epochs reaches a maximum value for the training, and wherein the translation sample text data represents a true text translation result corresponding to the source sample text; and
obtaining the text translation model according to the trained predetermined model.