US 12,405,920 B2
Data file clustering with KD-classifier trees
Prakhar Jain, Sunnyvale, CA (US); Frederick Ryan Johnson, Orem, UT (US); Terry Kim, Bellevue, WA (US); Vijayan Prabhakaran, Los Gatos, CA (US); and Bart Samwel, Oegstgeest (NL)
Assigned to Databricks, Inc., San Francisco, CA (US)
Filed by Databricks, Inc., San Francisco, CA (US)
Filed on Jul. 5, 2023, as Appl. No. 18/218,410.
Prior Publication US 2025/0013606 A1, Jan. 9, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/10 (2019.01); G06F 16/13 (2019.01); G06F 16/16 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/16 (2019.01) [G06F 16/134 (2019.01); G06F 16/2246 (2019.01); G06F 16/285 (2019.01)] 23 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, from a client device, a request to ingest one or more data files to a data table in a data storage system, the data table including a set of records, a record including values for one or more keys;
accessing a data classifier tree for the data table, the data classifier tree including a set of nodes and edges, the nodes of the data classifier tree representing conditions with respect to key-values for two or more keys;
for a data file in the one or more data files, traversing the data classifier tree to identify at least one node for the data file, key-values for the two or more keys in the data file satisfying the condition for the identified node and conditions associated with ancestor nodes of the identified node; and
writing the data file to the data storage system in association with the identified node of the data classifier tree.