US 12,265,534 B2
Generating classification data via a query processing system
George Kondiles, Chicago, IL (US); Jason Arnold, Chicago, IL (US); S. Christopher Gladwin, Chicago, IL (US); Joseph Jablonski, Chicago, IL (US); Daniel Coombs, Chicago, IL (US); and Andrew D. Baptist, Mt. Pleasant, WI (US)
Assigned to Ocient Holdings LLC, Chicago, IL (US)
Filed by Ocient Holdings LLC, Chicago, IL (US)
Filed on Feb. 2, 2023, as Appl. No. 18/163,490.
Application 18/163,490 is a continuation of application No. 16/838,459, filed on Apr. 2, 2020, granted, now 11,580,102.
Prior Publication US 2023/0177048 A1, Jun. 8, 2023
Int. Cl. G06F 16/245 (2019.01); G06F 11/34 (2006.01); G06F 16/22 (2019.01); G06F 16/2453 (2019.01); G06F 16/2455 (2019.01); G06F 16/2457 (2019.01)
CPC G06F 16/24537 (2019.01) [G06F 11/3409 (2013.01); G06F 16/2246 (2019.01); G06F 16/2456 (2019.01); G06F 16/24578 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for execution by a query processing system of a database system, comprising:
determining a request to implement a K Nearest Neighbors (KNN) algorithm to generate classification data for a set of new records;
determining a query operator execution flow for the request that includes a data sharing operator and a KNN-join operator based on the request; and
generating a query resultant that indicates classification data for the set of new records by performing decentralized execution of a plurality of operator executions in accordance with the query operator execution flow via a set of nodes performing assigned portions of the query operator execution flow based on:
performing decentralized execution of the data sharing operator based on each node of a subset of the set of nodes performing at least one operator execution in accordance with executing the data sharing operator to obtain the set of new records for processing, in parallel with performance of the data sharing operator by other ones of subset of the set of nodes, by:
accessing, via each node in parallel with all other nodes of the subset of the set of nodes, a corresponding subset of new records of the set of new records in local memory resources of the each node, wherein a first node of the set of nodes accesses a first corresponding subset of new records in first local memory resources, and wherein a second node of the set of node accesses a second corresponding subset of new records in second local memory resources separate from the first local memory resources;
sending, via the each node in parallel with all other nodes of the subset of the set of nodes, the corresponding subset of new records to a plurality of other nodes that includes all other nodes of the subset of the set of nodes, wherein the first node sends a first corresponding subset of new records to at least the second node, and wherein the second node sends the second corresponding subset of new records to at least the first node; and
receiving, via the each node in parallel with all other nodes of the subset of the set of nodes, a plurality of other subsets of new records of the set of new records based on receiving a corresponding of the plurality of other subsets of new records from a corresponding node of the plurality of other nodes, wherein the each node obtains the set of new records to include the corresponding subset of new records accessed in the local memory resources and to further include the plurality of other subsets of new records received from the plurality of other nodes, wherein the first node receives at least the second corresponding subset of new records from the second node, and wherein the second node receives at least the first corresponding subset of new records from the first node; and
performing decentralized execution of the KNN-join operator based on each node of the subset of the set of nodes further performing at least one operator execution in accordance with executing the KNN-join operator, in parallel with decentralized performance of the KNN-join operator by other ones of the subset of the set of nodes, by, for each record of the set of new records:
generating, via the each node in parallel with the all other nodes of the subset of the set of nodes, a plurality of similarity measures by performing a similarity function on the each record and each of a set of previously-classified records;
identifying, via the each node in parallel with the all other nodes of the subset of the set of nodes, a proper subset of the set of previously-classified records that includes exactly a predefined number of records with corresponding similarity measures that are most favorable of the plurality of similarity measures; and
joining, via the each node in parallel with the all other nodes of the subset of the set of nodes, the each record with the proper subset of the set of previously-classified records, wherein corresponding classification data is generated for the each record based on classification data of the proper subset of the set of previously-classified records.