US 11,928,180 B2
Automatic ground truth selection
Deepak Sekar, Chennai (IN); Anil Manohar Omanwar, Pune (IN); Drew Johnson, Cottesloe (AU); and Salil Ahuja, Washington, DC (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 18, 2021, as Appl. No. 17/205,734.
Prior Publication US 2022/0300756 A1, Sep. 22, 2022
Int. Cl. G06F 18/213 (2023.01); G06F 9/38 (2018.01); G06F 16/9035 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)
CPC G06F 18/213 (2023.01) [G06F 9/3836 (2013.01); G06F 16/9035 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
at least one processing component;
at least one memory component;
a ground truth repository;
a document processor configured to receive text;
an orchestration module configured to:
determine that the text includes a first text unit;
determine that the text includes a second text unit; and
determine that the text includes a portion comprising the first and second text units;
a feature extractor, configured to:
in response to the determining that the text includes the first text unit, extract features from the first text unit;
in response to the determining that the text includes the second text unit, extract features from the second text unit; and
in response to the determining that the text includes the portion, aggregate the features extracted from the first and second text units;
a scoring module, configured to:
generate a set of scores, comprising:
at least one score based on the features extracted from the first text unit;
at least one score based on the features extracted from the second text unit; and
at least one score based on the aggregated features; and
a candidate selector, configured to:
select, based on the set of scores, at least one ground truth candidate from the first text unit, the second text unit, and the portion;
determine that the at least one ground truth candidate includes at least one confirmed ground truth, wherein the determining comprises:
providing a question generated by a question generator based on the at least one ground truth candidate; and
receiving, from a user, an answer to the question, wherein the answer confirms the ground truth; and
add the confirmed ground truth to the ground truth repository.