| CPC G06F 16/9024 (2019.01) [G06F 16/2379 (2019.01); G06F 16/27 (2019.01)] | 18 Claims |

|
1. A method comprising:
creating and storing a dependency graph in memory, based on which a data pipeline is maintained,
the dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
reading configuration data specifying one or more periods for one or more datasets in the dependency graph;
detecting, at an unscheduled time, a first update to a first dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
determining, in response to the first update, that a current time is within a first period of the one or more periods from a fixed time of a day or a previous build of a first intermediate derived dataset occurred earlier than the current time less a second period of the one or more periods;
initiating, in response to the determining, at or near the current time, a first build of the first intermediate derived dataset that depends on the first dataset;
detecting that a frequency of updates to a dataset on which the first intermediate derived dataset depends exceeds a threshold;
in response to the detecting of the threshold being exceeded, updating the configuration data to revise the first period or the second period;
asynchronously detecting a second update to a second dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends;
initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset without waiting for the first update to propagate through the dependency graph;
detecting and initiating continuously as other updates to other datasets are received, wherein the method is performed using one or more processors.
|