CPC G06F 16/258 (2019.01) [G06F 40/151 (2020.01); G06F 40/205 (2020.01); G06F 40/295 (2020.01)] | 20 Claims |
1. A document transformation and processing method, comprising:
presenting, by at least one processor, an interactive graphical user interface that includes:
a file section that includes a plurality of respective options for selecting from a plurality of parsing pipelines and for selecting from a plurality of electronic files;
a document viewing section that displays an electronic file corresponding to a selection made in the file section; and
a parser section that displays at least some textual content of an electronic file corresponding to a selection made in the file section, wherein the parser section includes a plurality of tabs that, when selected, respectively provide options for defining start pages in a multi-part document included in an electronic file corresponding to a selection made in the file section, for mapping fields within an electronic file corresponding to a selection made in the file section, and for extracting sections within an electronic file corresponding to a selection made in the file section;
receiving, by the at least one processor via the graphical user interface in the file section, a selection of a parsing pipeline of the plurality of parsing pipelines (a “selected parsing pipeline”) and a selection of an electronic file of the plurality of electronic files (a “selected electronic file”);
accessing, by at least one processor, the selected electronic file in a first format;
processing, by the at least one processor, the selected electronic file to convert the selected electronic file to a second format;
accessing, by the at least one processor referencing information in at least one database, the selected parsing pipeline, including processing instructions for one or more of content extraction, entity recognition, and schema mapping;
applying, by the at least one processor, the selected parsing pipeline to at least some content in the selected electronic file;
extracting, by the at least one processor, the at least some content in the selected electronic file;
applying, by the at least one processor, entity recognition on the selected extracted content and generating output in response to the entity recognition;
mapping, by the at least one processor, the output to a respective one of a plurality of schemas;
presenting, by the at least one processor, in the interactive graphical user interface:
the selections received in the file section representing at least the selected parsing pipeline and the selected electronic file;
the selected electronic file in the document viewing section; and
at least some textual content of the selected electronic file in the parser section; and
in response to a user selection of at least some of the content of the selected electronic file displayed in the document viewing section, the at least one processor highlights corresponding mapped output displayed in the parser section and corresponding to, the respective one of the plurality of schemas.
|