US 12,265,790 B2
Method for correcting text, method for generating text correction model, device
Ruiqing Zhang, Beijing (CN); Zhongjun He, Beijing (CN); and Hua Wu, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Nov. 7, 2022, as Appl. No. 18/053,034.
Claims priority of application No. 202111350558.9 (CN), filed on Nov. 15, 2021.
Prior Publication US 2023/0090625 A1, Mar. 23, 2023
Int. Cl. G06F 40/279 (2020.01); G06F 40/166 (2020.01)
CPC G06F 40/279 (2020.01) [G06F 40/166 (2020.01)] 8 Claims
OG exemplary drawing
 
1. A method for correcting a text, comprising:
acquiring a text to be corrected;
acquiring a phonetic symbol sequence of the text to be corrected; and
obtaining a corrected text by inputting the text to be corrected and the phonetic symbol sequence into a text correction model, wherein, the text correction model obtains the corrected text by: detecting an error word in the text to be corrected, determining a phonetic symbol corresponding to the error word in the phonetic symbol sequence, and adding a phonetic feature corresponding to the phonetic symbol behind the error word to obtain a phonetic symbol text, and correcting the error word and the phonetic feature in the phonetic symbol text to obtain the corrected text;
wherein, the text correction model comprises an error detection submodel and an error correction submodel, each of the error detection submodel and the error correction submodel comprises one encoder and one decoder, and the two submodels share the one encoder; the error detection submodel performs encoding and a binary classification mapping on a vector representation of the input sample text to obtain a binary classification result; the error correction submodel performs encoding and one classification mapping on the vector representation of the input sample phonetic symbol text to obtain a corrected result;
wherein, the text correction model corrects the error word and the phonetic feature in the phonetic symbol text by the following to obtain the corrected text:
obtaining a candidate correction text by correcting the error word and the phonetic feature in the phonetic symbol text; and
obtaining the corrected text by performing de-duplication processing on the candidate correction text.