US 11,687,796 B2
Document type-specific quality model
Brien H. Muschett, Palm Beach Gardens, FL (US); Andrew R. Freed, Cary, NC (US); Roberto Delima, Apex, NC (US); David Contreras, Willow Spring, NC (US); and Krishna Mahajan, Raleigh, NC (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Apr. 17, 2019, as Appl. No. 16/386,652.
Prior Publication US 2020/0334546 A1, Oct. 22, 2020
Int. Cl. G06F 17/00 (2019.01); G06N 5/022 (2023.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01); G06F 16/908 (2019.01); G06F 40/30 (2020.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06N 5/025 (2023.01)
CPC G06N 5/022 (2013.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06F 40/30 (2020.01); G06N 5/025 (2013.01); G06N 20/00 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method implemented by an information handling system that includes a processor and a memory accessible by the processor, the method comprising:
receiving a document and a document type, wherein the document type identifies a document category to which the received document belongs;
identifying a set of structural linguistic features found in the document, wherein the set of structural linguistic features are associated with the document type;
retrieving a set of structural linguistic metrics corresponding to the document type;
comparing the set of structural linguistic features found in the document to the retrieved set of structural linguistic metrics;
automatically determining a quality of the received document based on the comparing, wherein the determining includes computing a quality score corresponding to the determined quality; and
in response to the quality score meeting a quality threshold, ingesting the document into a corpus that is utilized by a question-answering (QA) system.