| CPC G06F 30/27 (2020.01) [G06N 5/02 (2013.01)] | 20 Claims |

|
1. A method for using agents at a data lake to assist training, testing, and deploying a model for predicting one or more outputs from specified input to the model, said model having originated at a data science platform communicatively coupled to a data lake, said method comprising:
providing, at the data lake using one or more processors of a computer system at the data lake, a hierarchy of agents with a managing agent at a root node at level 0 of the hierarchy and multiple first level agents at respective nodes at a first level of the hierarchy, each first level agent being a child agent of the managing agent, said first level agents comprising a hardware resources agent, a software resources agent, a data resources agent, a software control agent, and a programming languages agent;
identifying, by the hardware resources agent using the one or more processors, a plurality of hardware resources stored at the data lake and physical characteristics of each hardware resource, each hardware resource being usable for training and testing the model;
identifying, by the data resources agent using the one or more processors, model input data stored at the data lake and an access speed for accessing the model input data and portions of the input data distributed in different storage locations in the data lake, said model input data including data usable as the specified input to the model;
identifying, by the data resources agent using the one or more processors, a first portion of the model input data to be used for said training and testing the model, said identifying the first portion of the model input data being based on a specified target accuracy of the model;
identifying, by the software resources agent using the one or more processors, a first software resource stored at the data lake, said first software resource being program code configured to split the first portion of the model input data into training input data for training the model and testing input data for testing the model;
triggering, by the software control agent using the one or more processors, a first execution at the data lake of the first software resource to split the first portion of the model input data into the training input data and the testing input data;
identifying, by the hardware resources agent using the one or more processors, a first portion of the hardware resources available to be used for training and testing the model, said identifying the first portion of the hardware resources being based on a size of the first portion of model input data, the access speed for accessing the first portion of the model input data, and the physical characteristics of the hardware resources;
identifying, by the programming languages agent using the one or more processors, a data model training language available at the data lake;
identifying, by the software resources agent using the one or more processors, a second software resource stored at the data lake, said second software resource being program code configured to be used for optimizing the model in accordance with an optimization algorithm used during said training the model; and
triggering, by the software control agent using the one or more processors, said training the model at the data lake using the training input data, the first portion of the hardware resources, the data model training language, and the second software resource.
|