US 12,216,616 B2
Incrementally improving clustering of cross partition data in a distributed data system
Babatunde Micheal Okutubo, Bellevue, WA (US); Maninderjit Singh Parmar, Redmond, WA (US); and Edgars Sedols, Bellevue, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Sep. 8, 2023, as Appl. No. 18/463,941.
Application 18/463,941 is a continuation of application No. 18/058,331, filed on Nov. 23, 2022, granted, now 11,789,902.
Application 18/058,331 is a continuation of application No. 16/881,379, filed on May 22, 2020, granted, now 11,537,557, issued on Dec. 27, 2022.
Prior Publication US 2023/0418784 A1, Dec. 28, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/13 (2019.01); G06F 16/27 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/13 (2019.01) [G06F 16/27 (2019.01); G06F 16/285 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system for improved access to rows of data, each data row associated with a partition of a plurality of partitions, the data rows distributed in a plurality of files, wherein a file including data rows associated with different partitions of the plurality of partitions is an impure file, the system comprising:
a processor; and
a memory device that stores program code to be executed by the processor, the program code causing the processor to:
analyze a depth map that indicates the depth of each target partition of the plurality of partitions, the depth of each target partition based on a number of impure files having a data row associated with the respective target partition;
select a subset of impure files from a plurality of impure files based on the depth map analysis;
sort the data rows of the selected subset of the impure files according to a respective associated target partition of each of the data rows;
generate a set of disjoint partition range files based on the sorting; and
transfer each file of the disjoint partition range files to a respective target partition.