CPC G06F 16/24539 (2019.01) [G06F 16/24552 (2019.01)] | 20 Claims |
1. A computer-implemented method, comprising:
executing, by a distributed computing system providing an analytical data processing service, a cluster comprising a plurality of nodes;
maintaining, by the distributed computing system, a state of the cluster, the state comprising a plurality of token bounds uniformly associated with the plurality of nodes;
receiving, by a driver node of the plurality of nodes, a query for execution;
identifying, based at least in part on the query, a set of one or more data segments corresponding to the query;
computing, by the driver node, a set of tokens corresponding to the set of one or more data segments;
launching, on a first executor executing on a first worker node of the plurality of nodes, a first task to process a first data segment from the set of one or more data segments, the first worker node selected based at least in part on a first token of the set of tokens corresponding to a first pair of token bounds of the plurality of token bounds, the first pair of token bounds associated with the first worker node based at least on a uniform distribution of the plurality of token bounds to one or more executors of the plurality of nodes; and
obtaining, by the first worker node, the first data segment from a distributed cache within the cluster, the distributed cache characterized by the plurality of token bounds uniformly associated with the plurality of nodes of the cluster.
|