US 12,147,411 B2
	Systems and methods for maintaining bifurcated data management while labeling data for artificial intelligence model development
Tania Cruz Morales, Washington, DC (US); Purva Shanker, Arlington, VA (US); Shannon Yogerst, New York, NY (US); Ignacio Espino, Arlington, VA (US); Dan Lin, Arlington, VA (US); and Nathan Wolfe, Silver Spring, MD (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Jan. 18, 2023, as Appl. No. 18/156,140.
Prior Publication US 2024/0241875 A1, Jul. 18, 2024
Int. Cl. G06F 16/00 (2019.01); G06F 16/23 (2019.01); G06N 20/00 (2019.01)

CPC G06F 16/23 (2019.01) [G06N 20/00 (2019.01)]

19 Claims

1. A system for maintaining bifurcated data management while labeling data for artificial intelligence model development, the system comprising:

a first datastore, wherein the first datastore is accessible to a first subset of a plurality of users, wherein the first subset of the plurality of users comprises a first attribute, wherein the first datastore comprises a first dataset, and wherein the first datastore comprises a relational database that stores a plurality of samples;

a second datastore, wherein the second datastore is specific to a first grouping of source code files, wherein the first grouping of source code files is accessible to a second subset of the plurality of users, wherein the second subset comprises a second attribute, and wherein the second datastore comprises a labeled data archive specific to the first grouping of source code files;

a third datastore, wherein the third datastore is accessible to a third subset of the plurality of users, wherein the third subset comprises a third attribute, and wherein the third datastore comprises unlabeled data sourced from the first datastore;

one or more preprocessors; and

a non-transitory computer readable medium comprising instructions that when executed by the one or more preprocessors cause operations comprising:

receiving a first label for a first sample from the first dataset;

receiving first version metadata of the first label, wherein the first version metadata comprises a proposed label for the first sample assigned by a first user;

determining, based on a first user input from the first user, the first grouping of source code files for storing the first version metadata of the first label;

determining a credential requirement for accessing the first grouping of source code files;

comparing the second attribute associated with the second subset of the plurality of users to the credential requirement;

based on the second attribute matching the credential requirement, determining that the first grouping of source code files is accessible to the second subset of the plurality of users;

receiving a second user input to generate training data for an artificial intelligence model based on version metadata of labels in the first grouping of source code files;

in response to the second user input, generating a second dataset for training the artificial intelligence model based on the version metadata of the labels in the first grouping of source code files; and

generating for display, in a user interface, a status notification for generation of the second dataset.