US 12,229,189 B2
Continuous builds of derived datasets in response to other dataset updates
Daniel Deutsch, New York, NY (US); Kyle Solan, San Francisco, CA (US); Thomas Mathew, New York, NY (US); and Vasil Vasilev, Cambridge, MA (US)
Assigned to Palantir Technologies Inc., Denver, CO (US)
Filed by Palantir Technologies Inc., Palo Alto, CA (US)
Filed on May 26, 2022, as Appl. No. 17/826,099.
Application 17/826,099 is a continuation of application No. 15/963,038, filed on Apr. 25, 2018, granted, now 11,379,525.
Claims priority of provisional application 62/589,856, filed on Nov. 22, 2017.
Prior Publication US 2022/0284057 A1, Sep. 8, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/901 (2019.01); G06F 16/23 (2019.01); G06F 16/27 (2019.01)
CPC G06F 16/9024 (2019.01) [G06F 16/2379 (2019.01); G06F 16/27 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
creating and storing a dependency graph in memory, based on which a data pipeline is maintained,
the dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
reading configuration data specifying one or more periods for one or more datasets in the dependency graph;
detecting, at an unscheduled time, a first update to a first dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
determining, in response to the first update, that a current time is within a first period of the one or more periods from a fixed time of a day or a previous build of a first intermediate derived dataset occurred earlier than the current time less a second period of the one or more periods;
initiating, in response to the determining, at or near the current time, a first build of the first intermediate derived dataset that depends on the first dataset;
detecting that a frequency of updates to a dataset on which the first intermediate derived dataset depends exceeds a threshold;
in response to the detecting of the threshold being exceeded, updating the configuration data to revise the first period or the second period;
asynchronously detecting a second update to a second dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset without waiting for the first update to propagate through the dependency graph;
detecting and initiating continuously as other updates to other datasets are received, wherein the method is performed using one or more processors.