| CPC G06F 16/2423 (2019.01) [G06N 20/00 (2019.01)] | 17 Claims |

|
1. A computer-implemented method for generating input data for training a machine-learning model, the method comprising:
receiving, from a user input, signal configuration information having instructions to generate a plurality of signals from raw data, wherein the user input includes custom code to be executed using an on-the-fly operation, the custom code defining a first signal and how to generate the first signal using the raw data;
receiving signal extraction information that has instructions to query a data store;
accessing, using Structured Query Language (SQL) code that is generated based on the signal extraction information, the raw data from the data store;
processing the raw data using the signal configuration information to generate the plurality of signals;
determining that the first signal is a new signal because the first signal was not previously generated in the prior iteration of the plurality of signals;
determining to omit a backfilling operation of the new signal because the first signal is directly generated from the raw data;
joining, using the SQL code, the plurality of signals with a first label source to generate training data and testing data; and
processing the training data and the testing data to generate input data, the input data being an ingestible file for a machine-learning pipeline.
|