| CPC G06N 3/08 (2013.01) [G06F 16/254 (2019.01); G06F 16/285 (2019.01); G06F 18/2113 (2023.01); G06F 18/214 (2023.01); H04L 67/10 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
retrieving, by a server system, data from a plurality of data sources, wherein the data from the plurality of data sources is in a plurality of different formats;
converting the data from the plurality of data sources to a common schema;
training, using training data comprising a set of predefined label suggestions, a machine learning classifier to output recommended label suggestions;
providing, as input to the trained machine learning classifier, the converted data in the common schema;
receiving, as output from the trained machine learning classifier, a plurality of label suggestions associated with the converted data;
initiating, by the server system, a session in a virtual environment hosting a first data labeling tool and a second data labeling tool for real time data labeling;
sending, to the first data labeling tool, a first subset of the converted data and the corresponding label suggestions associated with the first subset;
sending, to the second data labeling tools, a second subset of the converted data and the corresponding label suggestions associated with the second subset, wherein the first data labeling tool and the second data labeling tool are selected based on different data volumes associated with the first subset and the second subset of the converted data;
receiving, from the first data labeling tool and the second data labeling tool, a plurality of labels associated with the first subset and the second subset, wherein the first data labeling tool and the second data labeling tool are associated with different data formats;
exporting, to a label database, the plurality of labels and the converted data in the common schema; and
after exporting the plurality of labels, terminating the session in the virtual environment.
|