CPC G06F 16/2246 (2019.01) [G06F 16/221 (2019.01); G06F 16/243 (2019.01); G06F 40/205 (2020.01)] | 18 Claims |
1. A method comprising:
receiving one or more regulatory documents, wherein the one or more regulatory documents are from one or more regulation sources or include one or more regulatory-based words;
parsing the one or more regulatory documents, wherein the parsing comprises:
partitioning text of the one or more regulatory documents into segments of text, wherein each segment of text is located between adjacent formatting features;
organizing the segments of text according to a hierarchy, wherein the hierarchy is indicative of a structure of the segments of text in the one or more regulatory documents, wherein the segments of text are organized into a predetermined, single format;
creating a regulatory tree, the regulatory tree including a plurality of nodes and a plurality of edges, each of the plurality of nodes representing one of the segments of text and each of the plurality of edges representing a relationship between two of the segments of text, wherein creating the regulatory tree comprises:
storing the organized segments of text into one or more files, wherein each of the one or more files includes one or more lines, each of the one or more lines associated with one of the segments of text;
retrieving a targeted line of the one or more lines;
concatenating values in non-text fields of the targeted line and inserting columns between the values;
storing the concatenated values in one or more first columns;
storing a text field of the targeted line in one or more second columns;
specifying the one or more first columns as corresponding to the plurality of edges;
specifying the one or more second columns are corresponding to the plurality of nodes;
organizing the plurality of nodes in the regulatory tree in levels in the hierarchy; and
connecting nodes having different levels in the hierarchy using the plurality of edges; and
storing the one or more regulatory trees in a datastore or sending the one or more regulatory trees to one or more downstream applications.
|