US 11,989,096 B2
Search and retrieval data processing system for computing near real-time data aggregations
John MacLean, Melbourne (AU); and Paul Veiser, Singapore (SG)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on Nov. 23, 2016, as Appl. No. 15/360,449.
Claims priority of provisional application 62/270,257, filed on Dec. 21, 2015.
Prior Publication US 2017/0177446 A1, Jun. 22, 2017
Int. Cl. G06F 11/14 (2006.01); G06F 11/30 (2006.01); G06F 11/34 (2006.01); G06F 16/21 (2019.01); G06F 16/23 (2019.01); G06F 16/2455 (2019.01); G06F 17/40 (2006.01)
CPC G06F 11/1451 (2013.01) [G06F 11/3082 (2013.01); G06F 11/3438 (2013.01); G06F 11/3476 (2013.01); G06F 16/21 (2019.01); G06F 16/23 (2019.01); G06F 16/24565 (2019.01); G06F 16/24568 (2019.01); G06F 17/40 (2013.01); G06F 2201/84 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A method performed by a data processing system for processing data, the method including:
intermittently receiving data from one or more data streams, the received data including data records, a data record including data indicative of one or more events;
as data from the one or more data streams continue to be received, detecting, in the received data records each keyed based on an entity identifier, two or more data records that are each keyed based on a particular entity identifier;
for at least one detected data record including data indicative of a given event, wherein the at least one detected data record is associated with a particular time,
searching for a pre-computed aggregation of first data indicative of the given event and keyed based on the same particular entity identifier as the two or more data records detected in the one or more data streams,
wherein at least some of the first data of the pre-computed aggregation, which is indicative of the given event and keyed based on the same particular entity identifier as the two or more data records detected in the one or more data streams, is associated with a given time from a prior time period,
with the prior time period being defined as a range of given times associated with the at least some of the first data of the pre-computed aggregation that are keyed based on the same particular entity identifier as the two or more data records detected in the one or more data streams, and
wherein the end of the prior time period is prior to or the same as the particular time associated with the at least one data record that is detected in the one or more data streams and that is keyed based on the same particular entity identifier;
accessing near real-time data, which is indicative of the same given event as the pre-computed aggregation and keyed based on the same particular entity identifier as the pre-computed aggregation, from a field in the at least one detected data record and received from the one or more data streams and associated with the particular time that is after or the same as the end of the prior time period that includes the given times associated with the least some of the first data of the pre-computed aggregation keyed based on the same particular entity identifier;
generating a near real-time aggregation for the same given event as the pre-computed aggregation and for the same particular entity identifier as the pre-computed aggregation by combining (i) the near real-time data, which is indicative of the given event and keyed based on the same particular entity identifier, included in the accessed field of the at least one detected data record that is received from the one or more data streams, (ii) with the pre-computed aggregation of data that is indicative of the given event and that is keyed based on the same particular entity identifier, to produce the near real-time aggregation for the given event and for the same particular entity identifier, with the aggregation being near real-time with regard to when the data in the one or more data streams is received;
populating a data record that is keyed based on the same particular entity identifier with the near real-time aggregation for the given event and for the same particular entity identifier, and with data received from the one or more data streams, by:
inserting, into a field of the data record keyed based on the same particular entity identifier, the near real-time aggregation for the given event and for the same particular entity identifier and generated from combining (i) the near real-time data included in the at least one data record detected from the one or more data streams and that is indicative of the given event and that is keyed based on the same particular event identifier, and (ii) the pre-computed aggregation of data that is indicative of the given event and that is keyed based on the same particular event identifier,
inserting data from at least one of the data records received from the one or more data streams and keyed based on the same particular entity identifier into another field of the data record being populated and keyed based on the same particular entity identifier; and
processing the populated data record by applying one or more rules to the populated data record;
based on applying the rules, writing to memory one or more instructions for initiation of one or more actions; and
publishing the one or more instructions to a queue for initiation of the one or more actions.