US 12,265,533 B2
Dimensionality reduction and model training in a database system implementation of a K nearest neighbors model
Jason Arnold, Chicago, IL (US)
Assigned to Ocient Holdings LLC, Chicago, IL (US)
Filed by Ocient Holdings LLC, Chicago, IL (US)
Filed on Feb. 27, 2023, as Appl. No. 18/174,781.
Claims priority of provisional application 63/374,819, filed on Sep. 7, 2022.
Claims priority of provisional application 63/374,821, filed on Sep. 7, 2022.
Prior Publication US 2024/0289331 A1, Aug. 29, 2024
Int. Cl. G06F 16/2453 (2019.01); G06F 16/28 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/24532 (2019.01) [G06F 16/24537 (2019.01); G06F 16/24542 (2019.01); G06F 16/285 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
determining a first query that indicates a first request to generate a K nearest neighbors (KNN) model;
executing the first query to generate KNN model data for the KNN model based on:
determining a full training set of rows;
generating a reduced dataset from the full training set of rows based on, for each iteration of a plurality of training iterations:
generating a set of centroids that includes m centroids by performing a clustering algorithm upon the full training set of rows, wherein m has a value based on a number of iterations performed prior to the each iteration;
generating a new set of rows from the set of centroid values based on identifying a centroid classification label for each of the m centroids from a discrete set of labels by performing a KNN classification algorithm to classify each of the set of centroids by applying the full training set of rows; and
adding the new set of rows to the reduced dataset;
wherein the KNN model data is set as the reduced dataset for one of the plurality of training iterations;
determining a second query that indicates a second request to apply the KNN model to input data; and
executing the second query to generate model output of the KNN model for the input data based on, for each row in the input data, identifying a classification label for the each row from the discrete set of labels based on performing the KNN classification algorithm to classify the each row in the input data by applying the reduced data set.