CPC G06F 16/24532 (2019.01) [G06F 16/24537 (2019.01); G06F 16/24542 (2019.01); G06F 16/285 (2019.01); G06N 20/00 (2019.01)] | 19 Claims |
1. A method comprising:
determining a query for execution that indicates generating of a machine learning model;
generating a query operator execution flow for the query that includes a plurality of operators implementing a plurality of parallelized optimization processes configured to facilitate generating of the machine learning model, wherein generating the query operator execution flow for the query includes:
determining a parallelization parameter; and
determining an overwrite factor parameter; and
executing the query operator execution flow in conjunction with executing the query based on executing the plurality of operators, wherein executing the query operator execution flow includes:
reading a plurality of rows from memory of a relational database stored in memory resources, wherein a first set of columns of the plurality of rows correspond to a set of independent variables, and wherein at least one additional column of the plurality of rows corresponds to a dependent variable output; and
identifying a plurality of training data subsets from the plurality of rows based on performing a random shuffling process by applying the parallelization parameter and the overwrite factor parameter, wherein each of the plurality of training data subsets is utilized by a corresponding one of the plurality of parallelized optimization processes;
wherein executing each of the plurality of parallelized optimization processes includes generating a corresponding set of candidate model coefficients of a plurality of sets of candidate model coefficients based on, independently from executing other ones of the plurality of parallelized optimization processes:
initializing a set of locations for a set of particles of a search space corresponding to a set of configurable coefficients of the machine learning model, wherein a dimension of the search space is based on a number of coefficients in the set of configurable coefficients; and
performing a first instance of a first algorithm phase based on:
iteratively performing a first type of optimization algorithm independently upon each of the set of particles a plurality of times to update the set of locations and to initialize a set of best positions for the set of particles; and
updating the set of locations and the set of best positions generated via the first type of optimization algorithm based on performing a second type of optimization algorithm that is different from the first type of optimization algorithm;
wherein a corresponding set of candidate model coefficients is based on processing the set of best positions generated via the second type of optimization algorithm;
wherein the machine learning model is generated in executing the query based on selection of a most favorable set of candidate model coefficients from a plurality of sets of candidate model coefficients outputted via the plurality of parallelized optimization processes.
|