US 12,277,387 B2
Text processing method
Ruiqing Zhang, Beijing (CN); Zhongjun He, Beijing (CN); Zhi Li, Beijing (CN); and Hua Wu, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Nov. 16, 2022, as Appl. No. 18/056,197.
Claims priority of application No. 202111357039.5 (CN), filed on Nov. 16, 2021.
Prior Publication US 2023/0101401 A1, Mar. 30, 2023
Int. Cl. G06F 40/232 (2020.01); G06F 40/279 (2020.01); G06F 40/53 (2020.01)
CPC G06F 40/232 (2020.01) [G06F 40/279 (2020.01); G06F 40/53 (2020.01)] 17 Claims
OG exemplary drawing
 
1. A text processing method, comprising:
determining, by using a first module of an encoder, a fragment feature information of a text fragment and a fragment feature information of each of a plurality of reference text fragments respectively, wherein the plurality of reference text fragments and the text fragment are extracted from a same text to be processed;
determining, by using a second module of an encoder, an overall feature information of the text fragment based on the fragment feature information of the text fragment and the fragment feature information of each of the plurality of reference text fragments;
determining a character feature information corresponding to the target position in the text fragment based on the overall feature information of the text fragment;
determining a first probability value of each candidate character of a plurality of candidate characters corresponding to the target position in the text fragment based on the character feature information corresponding to the target position, wherein the character feature information is determined based on a context of the target position in the text fragment;
determining a second probability value of each candidate character of the plurality of candidate characters based on a character string including the candidate character and at least one character in at least one position in the text fragment adjacent to the target position; and
determining a correction character at the target position based on the first probability value and the second probability value of each candidate character of the plurality of candidate characters.