| CPC G06F 16/2272 (2019.01) [G06F 16/182 (2019.01); G06F 16/211 (2019.01); G06F 16/214 (2019.01); G06F 16/2455 (2019.01); G06F 16/24564 (2019.01); G06F 16/435 (2019.01)] | 21 Claims |

|
1. A method for loading semi-structured data into a data storage structure operable to accept and respond to structured queries, comprising:
deriving, by at least one electronic processing device and by execution of schema derivation rules stored in at least one non-transitory computer readable medium, a schema, by:
receiving, from a user and via a User Interface (UI), first input comprising a command and an identifier of a semi-structured data source;
extracting, utilizing the identifier of the semi-structured data source and from the semi-structured data source, a listing of tables and fields;
generating a unique identifier for each extracted table and field;
identifying keys linking two or more extracted tables;
creating a plurality of output files descriptive of the extracted tables and fields, the unique identifiers for the tables and fields, and the keys, thereby defining the schema;
generating, based on the plurality of output files, a plurality of table create statements; and
storing the plurality of table create statements in a schema definition file;
creating, by the at least one electronic processing device and by execution of the base layer creation rules stored in the at least one non-transitory computer readable medium, a base layer, by:
extracting, for each record in the semi-structured data source, all fields and corresponding values;
comparing each of the extracted fields and values to the plurality of output files descriptive of the extracted tables and fields;
mapping each of the extracted field values to a base layer table identified by a base layer table name;
storing an indication of the mapping for each of the extracted field values in a partition of a distributed file system;
creating, by executing at least one of the table create statements stored in the schema definition file, a plurality of base layer tables;
writing, utilizing the stored mapping for each of the extracted field values, each of the extracted field values into a corresponding base layer table of the plurality of base layer tables; and
creating, by the at least one electronic processing device and by execution of the Single Subject Layer (SSL) creation rules stored in the at least one non-transitory computer readable medium, an SSL layer, by:
creating, utilizing a mapping sheet comprising at least one column of data descriptive of at least one data relationship, an SSL configuration file;
generating, automatically, by executing a generic script and utilizing the SSL configuration file, SSL table creation code;
creating, utilizing the SSL table creation code, a plurality of SSL tables; and
loading the plurality of SSL tables with the corresponding values from the extracted fields of the records in the semi-structured data source.
|