US 11,989,194 B2
Addressing memory limits for partition tracking among worker nodes
Arindam Bhattacharjee, Fremont, CA (US); Sourav Pal, Foster City, CA (US); and Srinivas Bobba, Sunnyvale, CA (US)
Assigned to Splunk Inc., San Francisco, CA (US)
Filed by Splunk Inc., San Francisco, CA (US)
Filed on Oct. 18, 2019, as Appl. No. 16/657,867.
Application 16/657,867 is a continuation in part of application No. 16/398,038, filed on Apr. 29, 2019, granted, now 11,580,107.
Application 16/398,038 is a continuation in part of application No. 16/147,165, filed on Sep. 28, 2018, granted, now 10,956,415.
Application 16/147,165 is a continuation in part of application No. 16/051,197, filed on Jul. 31, 2018, granted, now 11,663,227.
Application 16/051,197 is a continuation in part of application No. 15/665,197, filed on Jul. 31, 2017, granted, now 11,461,334.
Application 15/665,197 is a continuation in part of application No. 15/665,279, filed on Jul. 31, 2017, granted, now 11,416,528.
Application 15/665,279 is a continuation in part of application No. 15/665,302, filed on Jul. 31, 2017, granted, now 10,795,884.
Application 15/665,302 is a continuation in part of application No. 15/665,148, filed on Jul. 31, 2017, granted, now 10,726,009.
Application 15/665,148 is a continuation in part of application No. 15/665,187, filed on Jul. 31, 2017, granted, now 11,232,100.
Application 15/665,187 is a continuation in part of application No. 15/665,339, filed on Jul. 31, 2017, abandoned.
Application 15/665,339 is a continuation in part of application No. 15/665,248, filed on Jul. 31, 2017, granted, now 11,163,758.
Application 15/665,248 is a continuation in part of application No. 15/665,159, filed on Jul. 31, 2017, granted, now 11,281,706.
Prior Publication US 2020/0065303 A1, Feb. 27, 2020
Int. Cl. G06F 16/2458 (2019.01); G06F 16/27 (2019.01)
CPC G06F 16/2471 (2019.01) [G06F 16/278 (2019.01)] 29 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
obtaining, by at least one worker node of a distributed query execution environment, a chunk of data, wherein the chunk of data comprises a plurality of records associated with a query;
assigning records of the plurality of records to individual data partitions of a set of data partitions at the at least one worker node, wherein individual partitions of the set of data partitions correspond to distinct portions of physical data storage of the at least one worker node;
based on a number of data partitions exceeding a threshold value, combining records across partitions within the set of partitions, wherein combining records across partitions within the set of partitions combines records sharing a field value into a particular partition;
combining the records sharing the field value in the particular partition into a single record having the field value; and
reducing a number of partitions in the set of partitions by: selecting an additional partition from the set of data partitions to be aggregated with the particular partition, wherein the additional partition is selected from among the set of data partitions based on the additional partition having a highest number of records, among the set of data partitions, that does not exceed a maximum number of records allowable within the additional partition, aggregating records of the particular partition with records of the additional partition by relocating at least the single record having the field value from the distinct portion of physical data storage corresponding to the particular partition to the distinct portion of physical data storage corresponding to the additional partition, and removing the particular partition from the at least one worker node.