CPC G06F 18/213 (2023.01) [G06F 9/3836 (2013.01); G06F 16/9035 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A system, comprising:
at least one processing component;
at least one memory component;
a ground truth repository;
a document processor configured to receive text;
an orchestration module configured to:
determine that the text includes a first text unit;
determine that the text includes a second text unit; and
determine that the text includes a portion comprising the first and second text units;
a feature extractor, configured to:
in response to the determining that the text includes the first text unit, extract features from the first text unit;
in response to the determining that the text includes the second text unit, extract features from the second text unit; and
in response to the determining that the text includes the portion, aggregate the features extracted from the first and second text units;
a scoring module, configured to:
generate a set of scores, comprising:
at least one score based on the features extracted from the first text unit;
at least one score based on the features extracted from the second text unit; and
at least one score based on the aggregated features; and
a candidate selector, configured to:
select, based on the set of scores, at least one ground truth candidate from the first text unit, the second text unit, and the portion;
determine that the at least one ground truth candidate includes at least one confirmed ground truth, wherein the determining comprises:
providing a question generated by a question generator based on the at least one ground truth candidate; and
receiving, from a user, an answer to the question, wherein the answer confirms the ground truth; and
add the confirmed ground truth to the ground truth repository.
|