US 12,111,832 B2
Techniques for a deterministic distributed cache to accelerate SQL queries
Devarajulu Kavali, Santa Clara, CA (US); Aneesh Malkhed, Bangalore (IN); Sounak Chakraborty, Bangalore (IN); Harish Ramesh Butani, San Jose, CA (US); Vivek Bhaskar, Pune (IN); Sandeep Akinapelli, Fremont, CA (US); and Devaraj Das, Fremont, CA (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jun. 16, 2021, as Appl. No. 17/349,813.
Claims priority of application No. 202141022725 (IN), filed on May 21, 2021.
Prior Publication US 2022/0374431 A1, Nov. 24, 2022
Int. Cl. G06F 16/2453 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/24539 (2019.01) [G06F 16/24552 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
executing, by a distributed computing system providing an analytical data processing service, a cluster comprising a plurality of nodes;
maintaining, by the distributed computing system, a state of the cluster, the state comprising a plurality of token bounds uniformly associated with the plurality of nodes;
receiving, by a driver node of the plurality of nodes, a query for execution;
identifying, based at least in part on the query, a set of one or more data segments corresponding to the query;
computing, by the driver node, a set of tokens corresponding to the set of one or more data segments;
launching, on a first executor executing on a first worker node of the plurality of nodes, a first task to process a first data segment from the set of one or more data segments, the first worker node selected based at least in part on a first token of the set of tokens corresponding to a first pair of token bounds of the plurality of token bounds, the first pair of token bounds associated with the first worker node based at least on a uniform distribution of the plurality of token bounds to one or more executors of the plurality of nodes; and
obtaining, by the first worker node, the first data segment from a distributed cache within the cluster, the distributed cache characterized by the plurality of token bounds uniformly associated with the plurality of nodes of the cluster.