CPC G06N 5/022 (2013.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06F 40/30 (2020.01); G06N 5/025 (2013.01); G06N 20/00 (2019.01)] | 17 Claims |
1. A method implemented by an information handling system that includes a processor and a memory accessible by the processor, the method comprising:
receiving a document and a document type, wherein the document type identifies a document category to which the received document belongs;
identifying a set of structural linguistic features found in the document, wherein the set of structural linguistic features are associated with the document type;
retrieving a set of structural linguistic metrics corresponding to the document type;
comparing the set of structural linguistic features found in the document to the retrieved set of structural linguistic metrics;
automatically determining a quality of the received document based on the comparing, wherein the determining includes computing a quality score corresponding to the determined quality; and
in response to the quality score meeting a quality threshold, ingesting the document into a corpus that is utilized by a question-answering (QA) system.
|