CPC G06F 16/906 (2019.01) [G06F 16/901 (2019.01); G06F 16/93 (2019.01)] | 24 Claims |
1. A document summary generating system comprising:
at least one processor; and
a memory storing a sequence of instructions which when executed by the at least one processor configures the at least one processor to:
obtain a document;
divide the document into chunks of content;
encode each chunk of the chunks of content to obtain encoded chunks of content;
cluster the encoded chunks of content into clusters of encoded chunks;
determine at least one central encoded chunk in each cluster of the clusters of encoded chunks;
generate a summary for the document based on the at least one central encoded chunk for each cluster of the clusters of encoded chunks;
determine a similarity score between a ground truth graph associated with the document and a predicted graph associated with the document, the at least one processor configured to:
obtain ground truth data with manually applied labels;
generate a graph for the ground truth data with manually applied labels;
generate a predicted graph using predicted attributes associated with the document, the at least one processor configured to:
obtain classified pages and unclassified pages from the document using a known document classifier;
extract known attributes from the classified pages using a document type classifier; and
extract the predicted attributes from the unclassified pages using a page classifier; and
determine a graph edit distance between the generated graph for the ground truth data and the predicted graph.
|