CPC G06F 16/3329 (2019.01) [G06F 16/355 (2019.01)] | 17 Claims |
1. A method, comprising:
obtaining a dataset having one or more columns, each of the one or more columns includes a title and at least one value;
obtaining a task description corresponding to a machine learning model to be trained using the dataset;
extracting, for each of the one or more columns, the title and a sample value lacking a corresponding unit of measurement from the at least one value;
synthesizing a question to elicit, from a language model, an answer indicating a predicted unit of measurement predicting the corresponding unit of measurement associated with the sample value from each of the one or more columns, the question synthesized based on:
the title for each of the one or more columns;
the sample value for each of the one or more columns; and
the task description corresponding to the machine learning model to be trained using the dataset;
sending the question to the language model to obtain an answer; and
generating from the answer to the question, the predicted unit of measurement for the at least one value in each of the one or more columns;
adding the predicted unit of measurement to the at least one value in each of the one or more columns in the dataset; and
training the machine learning model using the dataset including the added predicted unit of measurement.
|