CPC G06F 16/2471 (2019.01) [G06F 9/5066 (2013.01); G06F 16/13 (2019.01); G06F 16/1734 (2019.01); G06F 16/254 (2019.01); G06F 16/285 (2019.01); G06F 16/9024 (2019.01); G06F 16/284 (2019.01)] | 17 Claims |
1. A method including:
at a node of a cluster that stores a collection of data that can be operated on in parallel by nodes operating in conjunction with one another to carry out data processing operations on the collection of data, the node storing a first portion of data of the collection of data:
executing, at the node, a first instance of a data processing engine capable of accessing the first portion of data stored at the node and receiving data from a data source external to the cluster;
receiving a computer program by the data processing engine, the computer program configured for accessing the first portion of data stored at the node and including a) at least one component representing the cluster, b) at least one component representing the data source external to the cluster, and c) at least one link that represents at least one dataflow associated with a data processing operation;
executing at least part of the computer program by the first instance of the data processing engine;
accessing, by the data processing engine, the first portion of data stored at the node;
receiving, by the data processing engine, a second portion of data from the external data source; and
performing, by the data processing engine, the data processing operation using at least the first portion of data stored at the node and the second portion of data from the external data source.
|