| CPC G06F 16/22 (2019.01) [G06F 16/2477 (2019.01)] | 18 Claims |

|
1. A method comprising:
generating, by an online system and within a datastore of the online system, raw data comprising records representing values of a set of input features for a machine-learning model over time, wherein the raw data comprises a first set of records and a second set of records, wherein the first set of records are associated with a plurality of checkpoints, each checkpoint of the plurality of checkpoints indicating a value of an input feature of the set of input features at a corresponding point in time, and wherein each record of the second set of records indicating a relative change in an input feature of the set of input features at a corresponding point in time;
querying, by the online system, the datastore to produce a set of records based on one or more criteria of a query, wherein the one or more criteria of the query comprise a plurality of a target points in time, wherein each target point in time corresponds to an interaction by a user with the online system;
generating, by the online system and based on the set of records, a plurality of training examples for training the machine-learning model, wherein each training example represents an interaction by a user with the online system and comprises a value for each feature of the set of input features, and wherein generating a training example of the plurality of training examples comprises:
identifying a record of the first set of records corresponding to a checkpoint of the plurality of checkpoints with a closest point in time to the target point in time corresponding to the interaction represented by the training example;
identifying a subset of records of the second set of records with points in time between the point in time of the identified record and the target point in time;
computing a value for each feature of the set of input features at the target point in time by, for each feature of the set of input features:
identifying records of the subset of the second set of records that store values for the feature; and
computing a value of the feature for the training example based on the values stored in the identified records and a value for the feature stored in the identified record of the first set of records;
generating a label for the training example based on the interaction represented by the training example; and
generating the training example for the machine-learning model comprising the computed value of each feature of the set of input features and the generated label; and
training, by the online system, the machine-learning model based on the generated plurality of training examples by:
applying the machine-learning model to each of the plurality of training examples to generate an output;
comparing the generated outputs by the machine-learning model to the generated label of each of the corresponding training examples; and
updating parameters of the machine-learning model based on the comparison of the generated outputs to the generated labels.
|