US 11,720,583 B2
Processing data from multiple sources
Ian Schechter, Sharon, MA (US); Tim Wakeling, Andover, MA (US); and Ann M. Wollrath, Groton, MA (US)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on Aug. 1, 2022, as Appl. No. 17/878,106.
Application 17/878,106 is a continuation of application No. 16/865,975, filed on May 4, 2020, granted, now 11,403,308.
Application 16/865,975 is a continuation of application No. 15/431,984, filed on Feb. 14, 2017, granted, now 10,642,850, issued on May 5, 2020.
Application 15/431,984 is a continuation of application No. 14/255,579, filed on Apr. 17, 2014, granted, now 9,607,073, issued on Mar. 28, 2017.
Prior Publication US 2022/0365928 A1, Nov. 17, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/24 (2019.01); G06F 16/2458 (2019.01); G06F 16/13 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01); G06F 16/17 (2019.01); G06F 16/901 (2019.01); G06F 9/50 (2006.01)
CPC G06F 16/2471 (2019.01) [G06F 9/5066 (2013.01); G06F 16/13 (2019.01); G06F 16/1734 (2019.01); G06F 16/254 (2019.01); G06F 16/285 (2019.01); G06F 16/9024 (2019.01); G06F 16/284 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method including:
at a node of a cluster that stores a collection of data that can be operated on in parallel by nodes operating in conjunction with one another to carry out data processing operations on the collection of data, the node storing a first portion of data of the collection of data:
executing, at the node, a first instance of a data processing engine capable of accessing the first portion of data stored at the node and receiving data from a data source external to the cluster;
receiving a computer program by the data processing engine, the computer program configured for accessing the first portion of data stored at the node and including a) at least one component representing the cluster, b) at least one component representing the data source external to the cluster, and c) at least one link that represents at least one dataflow associated with a data processing operation;
executing at least part of the computer program by the first instance of the data processing engine;
accessing, by the data processing engine, the first portion of data stored at the node;
receiving, by the data processing engine, a second portion of data from the external data source; and
performing, by the data processing engine, the data processing operation using at least the first portion of data stored at the node and the second portion of data from the external data source.