| CPC G06F 16/2365 (2019.01) [G06F 16/219 (2019.01); G06F 16/221 (2019.01); G06F 16/2358 (2019.01)] | 20 Claims |

|
1. A computer-implemented method, comprising:
tracking, by a computing platform, lineage data associated with database transformations executed on a set of columns within a plurality of datasets associated with the computing platform;
generating, by the computing platform, a set of column profiles corresponding to the set of columns based at least in part on the lineage data, wherein the column profile for a given column comprises at least one of derivation information of the given column and usage information of the given column;
analyzing, by the computing platform, glossary data to identify two or more columns in the set of columns that are related, wherein the glossary data comprises semantic information related to one or more common terms assigned to the two or more columns in the set of columns, and wherein a first column of the two or more columns corresponds to a first dataset of the plurality of datasets and a second column of the two or more columns corresponds to a second dataset of the plurality of datasets;
enriching, by the computing platform, the lineage data of the column profiles corresponding to the two or more related columns, wherein the enriching comprises aggregating database transformations associated with the two or more related columns and consolidating the column profiles corresponding to the first column and the second column into a single column profile;
obtaining, by the computing platform, information related to a new database transformation involving at least one column in the set of columns;
determining, by the computing platform, whether the new database transformation is anomalous based at least in part on a comparison of the new database transformation to the set of column profiles and a data quality analysis associated with one or more database transformations identified in the enriched lineage data that are similar to the new transformation, and wherein the comparison comprises determining that the at least one column involved in the new database transformation is related to the two or more related columns based at least in part on the consolidated single column profile, extracting the aggregated database transformations from the enriched lineage data, and comparing the aggregated database transformations with the new database transformation;
outputting, by the computing platform, an alert to a user of the computing platform comprising information that indicates the new database transformation is anomalous; and
updating, by the computing platform, the set of column profiles based on a classification of the new database transformation provided as feedback from the user in response to the alert;
wherein the method is carried out by at least one computing device.
|