US 12,254,263 B2
Method, device, and system for analyzing unstructured document
Dong Hwan Kim, Seoul (KR); Hyun Ok Kim, Gwangmyeong-si (KR); Seong Woo Park, Seoul (KR); Jae Yeob Jung, Seoul (KR); Yo Han Moon, Seoul (KR); and Min Sun Song, Seoul (KR)
Assigned to 42Maru Inc., Seoul (KR)
Filed by 42Maru Inc., Seoul (KR)
Filed on Dec. 9, 2021, as Appl. No. 17/643,451.
Claims priority of application No. 10-2021-0172587 (KR), filed on Dec. 6, 2021.
Prior Publication US 2023/0177251 A1, Jun. 8, 2023
Int. Cl. G06F 40/106 (2020.01); G06F 16/332 (2019.01); G06F 16/34 (2019.01); G06F 40/109 (2020.01); G06F 40/186 (2020.01); G06N 3/08 (2023.01)
CPC G06F 40/106 (2020.01) [G06F 16/3323 (2019.01); G06F 16/34 (2019.01); G06F 40/109 (2020.01); G06F 40/186 (2020.01); G06N 3/08 (2013.01)] 18 Claims
OG exemplary drawing
 
1. An unstructured document analysis method that analyzes an unstructured document and generates an answer to a content query related to content included in the unstructured document, the unstructured document analysis method comprising operations of:
acquiring unstructured document data including font characteristic data and document structure data;
classifying the unstructured document data into a plurality of sectors when the font characteristic data included in the unstructured document data satisfies a predefined rule and extracting texts included in the unstructured document data for each of the plurality of sectors;
classifying the extracted texts into pre-classified items using a trained neural network model;
acquiring a content query related to the content included in the unstructured document data and associated with a pre-classified item among the pre-classified items; and
generating an answer to the content query on the basis of the extracted texts classified into the pre-classified items,
wherein the operation of acquiring the content query comprises operations of:
acquiring template data for summarizing the unstructured document data, the template data including a plurality of pre-classified items, keys and values, wherein a plurality of pieces of item information corresponding to the plurality of pre-classified items are allocated to the keys, and a plurality of answers corresponding to the plurality of pieces of item information are allocated to the values, and the plurality of pieces of item information are matched with the plurality of answers, respectively, and
recognizing item information corresponding to the pre-classified item and allocated to a key among the keys and acquiring the content query on the basis of the item information,
wherein a plurality of form conditions corresponding to the plurality of pieces of item information are predefined, and
wherein the operation of generating the answer to the content query comprises operations of:
calculating a correlation between each of pre-classified items by the trained neural network model,
acquiring an expected answer to the content query on the basis of a text classified as a pre-classified item with the highest correlation among the pre-classified items,
comparing the expected answer with a predefined form condition of the plurality of form conditions, and
determining that the expected answer is the answer to the content query based on a form of the expected answer satisfying the predefined form condition.