CPC G06F 18/23 (2023.01) [G06F 18/2155 (2023.01); G06N 20/00 (2019.01); G06V 10/751 (2022.01)] | 20 Claims |
1. A computing platform comprising:
at least one processor;
a communication interface communicatively coupled to the at least one processor; and
memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:
receive, from one or more data sources, a labelled data set;
apply, to the labelled data set, an unsupervised learning algorithm, wherein applying the unsupervised learning algorithm results in a clustered data set corresponding to the labelled data set, wherein applying the unsupervised learning algorithm comprises:
applying, as a first unsupervised learning algorithm, hierarchical clustering,
applying, after the hierarchical clustering and as a second unsupervised learning algorithm, k-means, and
applying, after the k-means and as a third unsupervised learning algorithm, a mixture model;
compare, for each data point in the labelled data set, corresponding clustering information associated with the clustered data set and labelling information associated with the labelled data set to identify discrepancies between the corresponding clustering information and labelling information for each data point;
flag, for data points with identified discrepancies between the corresponding clustering information and labelling information, a data labelling error;
grade, based on the flagged data labelling errors, each of the one or more data sources;
send, to a user device, the grades and one or more commands directing the user device to display the grades, wherein sending the one or more commands directing the user device to display the grades causes the user device to display the grades;
train, using remaining data of the labelled data set, not flagged with data labelling errors, a supervised learning model, wherein training the supervised learning model comprises training the supervised learning model by weighting the remaining data based on:
a corresponding data source, of the one or more data sources corresponding to each data point of the remaining data, and
the grades assigned to each of the one or more data sources; and
store, in the memory, the trained supervised learning model.
|