CPC G06F 16/322 (2019.01) [G06F 16/3335 (2019.01); G06F 40/14 (2020.01); G06F 40/205 (2020.01); G06F 40/30 (2020.01); G06N 5/025 (2013.01)] | 20 Claims |
1. A configurable, streaming hybrid-analytics platform comprising:
an extraction, translate, and load (ETL) module configured to collect one or more documents from one or more document sources;
an extraction engine configured to receive a document for extraction from the ETL module and to perform an extraction process, the extraction process including:
identifying a rule book to apply during extraction, wherein the rule book includes one or more text extraction rules, wherein each of the text extraction rules includes at least one match expression, and wherein the at least one match expression includes at least one pattern;
searching a pattern tree on a first set of text in the document to determine whether a pattern hit exists, wherein the pattern tree represents the at least one pattern included in the at least one match expression in a text extraction rule in the identified rule book, and wherein a hit indicates a match between the document text and a pattern in the pattern tree;
mapping each identified pattern hit to a rule in the one or more text extraction rules in the rule book to generate a set of mapped rules;
for each mapped rule, evaluating one or more predicates included in the mapped rule to determine whether a rule hit exists;
generating a citation for each identified rule hit;
determining whether additional rule books remain to apply during extraction;
if additional rule books remain to apply, identifying a next rule book and repeating the searching, mapping, evaluating, generating, and determining steps; and
if no additional rule books remain to apply, storing the generated citations in a citation database; and
a query engine configured to search for user-defined patterns or events.
|