CPC G06F 16/258 (2019.01) [G06F 16/212 (2019.01)] | 20 Claims |
1. A method comprising:
reading data of one or more files;
identifying one or more data types corresponding to the one or more files;
recommending at least one destination database for the data based at least in part on the one or more data types; and
generating a data model to use in connection with writing the data to the at least one destination database;
wherein the generation of the data model is based at least in part on the one or more data types corresponding to the one or more files;
wherein at least the generation of the data model is performed using one or more machine learning algorithms;
wherein generating the data model comprises using the one or more machine learning algorithms to:
identify a plurality of attributes to select as candidates for at least one node of the at least one destination database;
compute information gain for respective ones of the one or more attributes, wherein at least one attribute with a highest information gain than remaining ones of the plurality of attributes is selected as the at least one node of the at least one destination database; and
predict one or more partitions for the destination database, the one or more partitions respectively storing portions of the data; and
wherein the method is executed by a processing device operatively coupled to a memory.
|