US 11,914,968 B2
Official document processing method, device, computer equipment and storage medium
Xiaohui Jin, Guangdong (CN); Xiaowen Ruan, Guangdong (CN); and Liang Xu, Guangdong (CN)
Assigned to PING AN TECHNOLOGY (SHENZHEN) CO., LTD., Shenzhen (CN)
Appl. No. 17/620,817
Filed by PING AN TECHNOLOGY (SHENZHEN) CO., LTD., Guangdong (CN)
PCT Filed Dec. 11, 2020, PCT No. PCT/CN2020/135718
§ 371(c)(1), (2) Date Dec. 20, 2021,
PCT Pub. No. WO2021/121158, PCT Pub. Date Jun. 24, 2021.
Claims priority of application No. 202010523793.0 (CN), filed on Jun. 10, 2020.
Prior Publication US 2022/0414345 A1, Dec. 29, 2022
Int. Cl. G06F 40/40 (2020.01); G06F 40/103 (2020.01); G06V 30/418 (2022.01); G06V 30/412 (2022.01)
CPC G06F 40/40 (2020.01) [G06F 40/103 (2020.01); G06V 30/412 (2022.01); G06V 30/418 (2022.01)] 19 Claims
OG exemplary drawing
 
1. An official document processing method, comprising:
receiving a review request containing a to-be-reviewed official document sent by a user, performing format analysis on the to-be-reviewed official document and acquiring file type of the to-be-reviewed official document, then acquiring the to-be-reviewed official document of standard file type, and identifying all file component contents in the to-be-reviewed official document of standard file type by a preset BERT model;
performing text format detection, text content detection and frame layout detection synchronously by a preset text processing model constructed based on a distributed framework, obtaining a format detection result, a content detection result and a layout detection result; the text format detection comprises following steps of: calling a format detection rule corresponding to each file component content, extracting a text format keyword in the file component content, and obtaining a format detection result according to the text format keyword and a format bar in the format detection rule corresponding to the text format keyword; the text content detection comprises step of obtaining a content detection result after performing text content detection on the file component content; the frame layout detection comprises following steps of: dividing coordinate information of the to-be-reviewed official document of standard file type, and performing frame layout detection on the to-be-reviewed official document according to the coordinate information that is divided to obtain a layout detection result; and
generating a detected error content according to the format detection result, content detection result and layout detection result, calling out a standard writing rule corresponding to the detected error content, marking the detected error content and the standard writing rule at a preset position in the to-be-reviewed official document, and sending the to-be-reviewed official document that is successfully marked to a preset receiving location according to a storage path designated by the user.