US 12,154,361 B2
Method and apparatus of image-to-document conversion based on OCR, device, and readable storage medium
Xingyao Chen, Shenzhen Guangdong (CN); Canlu Huang, Shenzhen Guangdong (CN); Wencan Hu, Shenzhen Guangdong (CN); Yidong Chen, Guangdong (CN); Hanquan Lin, Guangdong (CN); Fei Huang, Guangdong (CN); Geyang Ke, Guangdong (CN); and Zhiquan Yang, Guangdong (CN)
Assigned to Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Guangdong (CN)
Filed on May 6, 2021, as Appl. No. 17/313,755.
Application 17/313,755 is a continuation of application No. PCT/CN2020/078181, filed on Mar. 6, 2020.
Claims priority of application No. 201910224228.1 (CN), filed on Mar. 22, 2019.
Prior Publication US 2021/0256253 A1, Aug. 19, 2021
Int. Cl. G06F 17/00 (2019.01); G06F 18/214 (2023.01); G06F 40/106 (2020.01); G06V 10/44 (2022.01); G06V 30/146 (2022.01); G06V 30/148 (2022.01); G06V 30/414 (2022.01); G06V 30/10 (2022.01)
CPC G06V 30/414 (2022.01) [G06F 18/214 (2023.01); G06F 40/106 (2020.01); G06V 10/44 (2022.01); G06V 30/1463 (2022.01); G06V 30/15 (2022.01); G06V 30/10 (2022.01); G06V 30/146 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method of image-to-document conversion based on optical character recognition (OCR), the method comprising:
obtaining an image to be converted into a target document;
classifying regions of the image according to image content of the image, to obtain n image regions, each of the n image regions being classified to a corresponding content type, n being an integer greater than or equal to 3, and at least 3 of the n image regions being classified into different respective content types, wherein the n image regions are obtained by performing (i) combination processing including combining consecutive regions of the image belonging to a same content type, (ii) generating a binary tree having the n image regions as nodes, (iii) performing depth traversing of the binary tree to obtain a reading sequence, and (iv) performing intersection processing including adjusting positions of regions that intersect each other based on the reading sequence;
for each of the n image regions, processing image content in the respective image region, by processing circuitry of a server, according to the content type to which the respective image region was classified, to obtain converted content corresponding to the respective image region; and
adding the converted content corresponding to the n image regions to an electronic document, to obtain the target document.