CPC G06F 16/2272 (2019.01) [G06F 16/182 (2019.01); G06F 16/211 (2019.01); G06F 16/214 (2019.01); G06F 16/2455 (2019.01); G06F 16/24564 (2019.01); G06F 16/435 (2019.01)] | 21 Claims |
1. A computer system utility for loading semi-structured data into a data storage structure operable to accept and respond to structured queries, comprising:
at least one electronic processing device; and
at least one non-transitory computer readable medium storing (i) semi-structured source data, (ii) schema derivation rules, (iii) base layer creation rules, (iv) Single Subject Layer (SSL) creation rules, and (v) operating instructions that when executed by the at least one electronic processing device, result in:
deriving, by execution of the schema derivation rules, a schema, by:
receiving, from a user and via a User Interface (UI), first input comprising a command and an identifier of a semi-structured data source;
extracting, utilizing the identifier of the semi-structured data source and from the semi-structured data source, a listing of tables and fields;
generating a unique identifier for each extracted table and field;
identifying keys linking two or more extracted tables; and
creating a plurality of output files descriptive of the extracted tables and fields, the unique identifiers for the tables and fields, and the keys, thereby defining the schema;
generating, based on the plurality of output files, a plurality of table create statements; and
storing the plurality of table create statements in a schema definition file;
creating, by execution of the base layer creation rules, a base layer, by:
extracting, for each record in the semi-structured data source, all fields and corresponding values;
comparing each of the extracted fields and values to the plurality of output files descriptive of the extracted tables and fields;
mapping each of the extracted field values to a base layer table identified by a base layer table name;
storing an indication of the mapping for each of the extracted field values in a partition of a distributed file system;
creating, by executing at least one of the table create statements stored in the schema definition file, a plurality of base layer tables;
writing, utilizing the stored mapping for each of the extracted field values, each of the extracted field values into a corresponding base layer table of the plurality of base layer tables; and
creating, by execution of the Single Subject Layer (SSL) creation rules, an SSL layer, by:
creating, utilizing a mapping sheet comprising at least one column of data descriptive of at least one data relationship, an SSL configuration file;
generating, automatically, by executing a generic script and utilizing the SSL configuration file, SSL table creation code;
creating, utilizing the SSL table creation code, a plurality of SSL tables; and
loading the plurality of SSL tables with the corresponding values from the extracted fields of the records in the semi-structured data source.
|