US 12,332,859 B2
Generating datastore checkpoints
Jacob Jensen, Metuchen, NJ (US)
Assigned to Maplebear Inc., San Francisco, CA (US)
Filed by Maplebear Inc., San Francisco, CA (US)
Filed on Jun. 14, 2022, as Appl. No. 17/840,454.
Prior Publication US 2023/0401186 A1, Dec. 14, 2023
Int. Cl. G06F 16/22 (2019.01); G06F 16/2458 (2019.01)
CPC G06F 16/22 (2019.01) [G06F 16/2477 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
generating, by an online system and within a datastore of the online system, raw data comprising records representing values of a set of input features for a machine-learning model over time, wherein the raw data comprises a first set of records and a second set of records, wherein the first set of records are associated with a plurality of checkpoints, each checkpoint of the plurality of checkpoints indicating a value of an input feature of the set of input features at a corresponding point in time, and wherein each record of the second set of records indicating a relative change in an input feature of the set of input features at a corresponding point in time;
querying, by the online system, the datastore to produce a set of records based on one or more criteria of a query, wherein the one or more criteria of the query comprise a plurality of a target points in time, wherein each target point in time corresponds to an interaction by a user with the online system;
generating, by the online system and based on the set of records, a plurality of training examples for training the machine-learning model, wherein each training example represents an interaction by a user with the online system and comprises a value for each feature of the set of input features, and wherein generating a training example of the plurality of training examples comprises:
identifying a record of the first set of records corresponding to a checkpoint of the plurality of checkpoints with a closest point in time to the target point in time corresponding to the interaction represented by the training example;
identifying a subset of records of the second set of records with points in time between the point in time of the identified record and the target point in time;
computing a value for each feature of the set of input features at the target point in time by, for each feature of the set of input features:
identifying records of the subset of the second set of records that store values for the feature; and
computing a value of the feature for the training example based on the values stored in the identified records and a value for the feature stored in the identified record of the first set of records;
generating a label for the training example based on the interaction represented by the training example; and
generating the training example for the machine-learning model comprising the computed value of each feature of the set of input features and the generated label; and
training, by the online system, the machine-learning model based on the generated plurality of training examples by:
applying the machine-learning model to each of the plurality of training examples to generate an output;
comparing the generated outputs by the machine-learning model to the generated label of each of the corresponding training examples; and
updating parameters of the machine-learning model based on the comparison of the generated outputs to the generated labels.