US 12,007,979 B2
Systems and methods for data consistency and alignment in data analytics platforms
Jozsef Szalay, Austin, TX (US); and Sergei Kozyrenko, Austin, TX (US)
Assigned to CS DISCO, INC., Austin, TX (US)
Filed by CS Disco, Inc., Austin, TX (US)
Filed on Jun. 15, 2022, as Appl. No. 17/841,508.
Prior Publication US 2023/0409557 A1, Dec. 21, 2023
Int. Cl. G06F 16/23 (2019.01)
CPC G06F 16/2365 (2019.01) 18 Claims
OG exemplary drawing
 
1. A data analytics system, comprising:
a processor;
a data store, comprising:
a plurality of dataset definitions, each dataset definition including a consistency time window and a data resolution, wherein the consistency time window defines a first time interval at which data for a corresponding dataset is received from a corresponding data source associated with the dataset and the data resolution defines a second time interval between one or more data records included in the data received from the data source at the first time interval, and wherein each of the data records includes a value;
a plurality of datasets, each dataset corresponding to one of the plurality of dataset definition; and
a non-transitory computer readable medium comprising instructions for:
for each dataset:
receiving data from the data source corresponding to the dataset at the first time interval, the data comprising one or more data records at the second time interval;
storing the one or more received data records in the received data in change sets of the dataset, the change sets associated with a beginning time and an end time;
receiving a query comprising a query time, the query associated with the plurality of datasets;
evaluating, by a query processor software module, all of the plurality of datasets to determine a reference time for the plurality of datasets based on the query time, the consistency time window of each dataset, and the data resolution of each dataset, wherein the reference time is a time that is closest in time to the query time of the query where the values for the plurality of datasets are time aligned;
determining, by a query processor software module, the value of each dataset at the reference time from the data record of that dataset associated with the reference time; and
returning the value of each dataset at the reference time and the reference time in response to the query.