| CPC G06F 16/23 (2019.01) [G06N 20/00 (2019.01)] | 19 Claims |

|
1. A system for maintaining bifurcated data management while labeling data for artificial intelligence model development, the system comprising:
a first datastore, wherein the first datastore is accessible to a first subset of a plurality of users, wherein the first subset of the plurality of users comprises a first attribute, wherein the first datastore comprises a first dataset, and wherein the first datastore comprises a relational database that stores a plurality of samples;
a second datastore, wherein the second datastore is specific to a first grouping of source code files, wherein the first grouping of source code files is accessible to a second subset of the plurality of users, wherein the second subset comprises a second attribute, and wherein the second datastore comprises a labeled data archive specific to the first grouping of source code files;
a third datastore, wherein the third datastore is accessible to a third subset of the plurality of users, wherein the third subset comprises a third attribute, and wherein the third datastore comprises unlabeled data sourced from the first datastore;
one or more preprocessors; and
a non-transitory computer readable medium comprising instructions that when executed by the one or more preprocessors cause operations comprising:
receiving a first label for a first sample from the first dataset;
receiving first version metadata of the first label, wherein the first version metadata comprises a proposed label for the first sample assigned by a first user;
determining, based on a first user input from the first user, the first grouping of source code files for storing the first version metadata of the first label;
determining a credential requirement for accessing the first grouping of source code files;
comparing the second attribute associated with the second subset of the plurality of users to the credential requirement;
based on the second attribute matching the credential requirement, determining that the first grouping of source code files is accessible to the second subset of the plurality of users;
receiving a second user input to generate training data for an artificial intelligence model based on version metadata of labels in the first grouping of source code files;
in response to the second user input, generating a second dataset for training the artificial intelligence model based on the version metadata of the labels in the first grouping of source code files; and
generating for display, in a user interface, a status notification for generation of the second dataset.
|