US 12,135,711 B2
	Implementing nonlinear optimization during query execution via a relational database system
Jason Arnold, Chicago, IL (US)
Assigned to Ocient Holdings LLC, Chicago, IL (US)
Filed by Ocient Holdings LLC, Chicago, IL (US)
Filed on Aug. 29, 2023, as Appl. No. 18/457,496.
Claims priority of provisional application 63/374,819, filed on Sep. 7, 2022.
Claims priority of provisional application 63/374,821, filed on Sep. 7, 2022.
Prior Publication US 2024/0078232 A1, Mar. 7, 2024
Int. Cl. G06F 16/2453 (2019.01); G06F 16/28 (2019.01); G06N 20/00 (2019.01)

CPC G06F 16/24532 (2019.01) [G06F 16/24537 (2019.01); G06F 16/24542 (2019.01); G06F 16/285 (2019.01); G06N 20/00 (2019.01)]

19 Claims

1. A method comprising:

determining a query for execution that indicates generating of a machine learning model;

generating a query operator execution flow for the query that includes a plurality of operators implementing a plurality of parallelized optimization processes configured to facilitate generating of the machine learning model, wherein generating the query operator execution flow for the query includes:

determining a parallelization parameter; and

determining an overwrite factor parameter; and

executing the query operator execution flow in conjunction with executing the query based on executing the plurality of operators, wherein executing the query operator execution flow includes:

reading a plurality of rows from memory of a relational database stored in memory resources, wherein a first set of columns of the plurality of rows correspond to a set of independent variables, and wherein at least one additional column of the plurality of rows corresponds to a dependent variable output; and

identifying a plurality of training data subsets from the plurality of rows based on performing a random shuffling process by applying the parallelization parameter and the overwrite factor parameter, wherein each of the plurality of training data subsets is utilized by a corresponding one of the plurality of parallelized optimization processes;

wherein executing each of the plurality of parallelized optimization processes includes generating a corresponding set of candidate model coefficients of a plurality of sets of candidate model coefficients based on, independently from executing other ones of the plurality of parallelized optimization processes:

initializing a set of locations for a set of particles of a search space corresponding to a set of configurable coefficients of the machine learning model, wherein a dimension of the search space is based on a number of coefficients in the set of configurable coefficients; and

performing a first instance of a first algorithm phase based on:

iteratively performing a first type of optimization algorithm independently upon each of the set of particles a plurality of times to update the set of locations and to initialize a set of best positions for the set of particles; and

updating the set of locations and the set of best positions generated via the first type of optimization algorithm based on performing a second type of optimization algorithm that is different from the first type of optimization algorithm;

wherein a corresponding set of candidate model coefficients is based on processing the set of best positions generated via the second type of optimization algorithm;

wherein the machine learning model is generated in executing the query based on selection of a most favorable set of candidate model coefficients from a plurality of sets of candidate model coefficients outputted via the plurality of parallelized optimization processes.