US 11,727,285 B2
Method and server for managing a dataset in the context of artificial intelligence
Frédéric Branchaud-Charron, Montreal (CA); Parmida Atighehchian, Montreal (CA); Jan Freyberg, Montreal (CA); and Lorne Schell, Montreal (CA)
Assigned to ServiceNow Canada Inc., Montreal (CA)
Filed by Element AI Inc., Montreal (CA)
Filed on Jan. 31, 2020, as Appl. No. 16/779,481.
Prior Publication US 2021/0241135 A1, Aug. 5, 2021
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06N 7/01 (2023.01); G06F 16/28 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 16/285 (2019.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] 14 Claims
OG exemplary drawing
 
1. A method for managing a dataset, the dataset comprising data items, labelling tasks associated to the data items and labels corresponding to answers to the labelling tasks, the method comprising:
determining an artificial intelligence (AI) model to be used on the dataset;
creating a labeling status mask describing a labeling status of the data items of the dataset; the labeling status of each data item indicating if the label corresponding to the answer to the labeling task for said data item is received, and
repeating a loop, until patience parameters are satisfied:
receiving one or more trusted labels provided by one or more trusted data labelers;
updating the labeling status mask by changing the labeling status of the data items for which a trusted label is received;
from a labelled data items subset obtained using the labeling status mask, training the AI model;
cloning the trained AI model into a local AI model on the processing nodes;
from an unlabelled data items subset obtained using the labeling status mask, creating a randomized unlabeled subset having fewer members than the unlabelled data items subset;
at a cluster manager server, chunking the randomized unlabeled subset into a plurality of data subsets, and dispatching the chunked data subsets to one or more of the processing nodes;
at the cluster manager server, receiving an indication that one or more predicted label answers have been inferred by the one or more processing nodes using the local AI model; and
computing a model uncertainty measurement from statistical analysis of the one or more predicted label answers;
wherein the patience parameters include one or more of: a threshold value on the model uncertainty measurement and information gain between different training cycles.