US 12,229,104 B2
Querying multi-dimensional time series data sets
Benjamin Duffield, New York, NY (US); David Tobin, Atherton, CA (US); Hasan Dincel, London (GB); Mihir Pandya, Palo Alto, CA (US); Stephen Nicholas Barton, New York, NY (US); and Samantha Woodward, New York, NY (US)
Assigned to Palantir Technologies Inc., Denver, CO (US)
Filed by Palantir Technologies Inc., Palo Alto, CA (US)
Filed on Jun. 8, 2020, as Appl. No. 16/895,447.
Claims priority of application No. 1908091 (GB), filed on Jun. 6, 2019.
Prior Publication US 2020/0387492 A1, Dec. 10, 2020
Int. Cl. G06F 16/22 (2019.01); G06F 16/2455 (2019.01); G06F 16/2457 (2019.01); G06F 16/248 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/2264 (2019.01) [G06F 16/2455 (2019.01); G06F 16/24573 (2019.01); G06F 16/248 (2019.01); G06F 16/283 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, performed by one or more processors, the method comprising:
under control of a middleware analysis platform:
receiving, real-time streaming data originating from a plurality of sensors associated with one or more technical systems, the real-time streaming data representing one or more multi-dimensional time series data sets, the real-time streaming data comprising a plurality of streams associated with respective sensors and representing a dimension relating to a time-varying quantity or parameter measured or detected by the respective sensor at a plurality of time intervals, wherein the middleware analysis platform operates independently from real-time data collection of the one or more multi-dimensional time series data sets by the plurality of sensors;
prior to parsing and cleaning the real-time streaming data, storing the real-time streaming data in a cold storage as raw data received from the plurality of sensors, the raw data comprising unparsed and uncleaned data;
cleaning the real-time streaming data;
parsing a first multi-dimensional time series data set of the one or more multi-dimensional time series data sets received from a first sensor of the plurality of sensors by structuring the real-time streaming data of the first multi-dimensional time series data set according to a first format associated with a first ontology associated with the first sensor;
parsing a second multi-dimensional time series data set of the one or more multi-dimensional time series data sets received from a second sensor of the plurality of sensors by structuring the real-time streaming data of the second multi-dimensional time series data set according to a second format associated with a second ontology associated with the second sensor;
storing the parsed time series data sets in one or more time-series databases;
in response to identifying missing data or erroneous data stored in the one or more time series databases, retrieving data corresponding to the missing data or the erroneous data from the cold storage and updating the parsed time series data sets in the one or more time-series databases with said retrieved data from the cold storage;
receiving a query for performing one or more computational operations on the parsed time series data sets representing the one or more multi-dimensional time series data sets collected in real-time from the plurality of sensors associated with the one or more technical systems, and wherein the query comprises a user-defined expression comprising a plurality of operation nodes for relating the one or more multi-dimensional time series data sets with each other according to the one or more computational operations;
automatically updating the user-defined expression to reduce a quantity of operation nodes by combining two or more of the plurality of operation nodes to generate a combined operation node;
identifying a location of the one or more multi-dimensional time series data sets in one or more databases based on accessing metadata associated with the one or more multi-dimensional time series data sets in the one or more databases, said one or more databases being pre-registered with the middleware analysis platform, the metadata including identifiers of the one or more multi-dimensional time series data sets and their respective storage locations in the one or more databases;
retrieving the one or more multi-dimensional time series data sets from the one or more databases substantially in real time with receiving the query for performing the one or more computational operations; and
performing, according to the updated user-defined expression, the one or more computational operations on the retrieved one or more multi-dimensional time series data sets to generate a resultant time series data set, wherein the middleware analysis platform is configured to perform the one or more computational operations substantially in real time with receiving the query for performing the one or more computational operations;
displaying, via an interactive graphical user interface, a multi-dimensional visualization of the resultant time series data set to permit a user to analyze one or more states of the one or more technical systems in substantially real-time;
monitoring the resultant time series data set to detect a predetermined condition of the resultant time series data set, wherein the predetermined condition is based on a relationship between the first multi-dimensional time series data set and the second multi-dimensional time series data set;
in response to detecting the predetermined condition of the resultant time series data set, displaying, via the interactive graphical user interface, an alert based on the predetermined condition of the resultant time series data set, the alert comprising:
information relating to the predetermined condition of the resultant time series data set and the one or more technical systems, and
indications of one or more system operations to be performed on the one or more technical systems; and
in response to receiving one or more user selections via the interactive graphical user interface of the indications of the one or more system operations, performing one or more system operations on the one or more technical systems according to the one or more user selections.