| CPC G06F 16/2462 (2019.01) | 3 Claims |

|
1. A computing system facilitating data redundancy detection and consolidation, the computing system comprising:
at least one processor;
a communication interface communicatively coupled to the at least one processor; and
a memory device storing executable code that, when executed, causes the at least one processor to:
perform data analysis on at least two separate datasets to identify any redundancies, the data analysis including:
comparing at least one first column name of one or more first columns of first data values of a first dataset of the at least two separate datasets with at least one second column name of one or more second columns of second data values of a second dataset of the at least two separate datasets, the comparing including deriving semantic meaning from the at least one first column name and the at least one second column name;
determining the at least one first column name has a same meaning as the at least one second column name;
based on the determining, identifying a first maximum value of the first data values, first mean value of the first data values, and a first range of values of the first data values, a second maximum value of the second data values, a second mean value of the second data values, and a second range of values of the second data values; and
comparing the first maximum value to the second maximum value, the first mean value to the second mean value, and the first range to the second range, and based thereon calculating a likelihood of similarity between the first dataset and the second dataset; and
transmit one or more electronic communications to one or more computing devices to alert a user of the likelihood of similarity between the first dataset and the second dataset;
receive, from the computing device, one or more inputs indicating the user desires for one or more datasets of the at least two separate datasets to be consolidated; and
consolidate, based on the one or more inputs, the one or more datasets of the at least two separate datasets thereby saving space at the one or more data storage locations.
|