| CPC G06N 20/00 (2019.01) [G06F 11/3409 (2013.01); G06F 16/2471 (2019.01); G06N 5/043 (2013.01); H04L 51/02 (2013.01)] | 20 Claims |

|
1. A method comprising:
receiving, by a query serving system, a request to serve a query for a new skillbot associated with a new machine-learning model, the query being initiated by a user input of a user, wherein the query serving system includes a first plurality of deployments in a serving pool, and a second plurality of deployments in a free pool,
wherein:
each deployment of the first plurality of deployments and the second plurality of deployments is a computing unit and includes a plurality of sub-containers and a model manager container that hosts a model manager,
each sub-container of the plurality of sub-containers is configured to host one of a plurality of machine-learning models downloadable by the model manager, each machine-learning model of the plurality of machine-learning models being associated with a skillbot,
the first plurality of deployments includes (a) active sub-containers, each of the active sub-containers having loaded therein a machine-learning model of the plurality of machine-learning models, the new machine-learning model has not been loaded into any active sub-container, and (b) vacant sub-containers, each of the vacant sub-containers not having loaded therein any machine-learning model of the plurality of machine-learning models, and
the second plurality of deployments includes the vacant sub-containers and no active sub-container;
determining whether a number of the first plurality of deployments is less than a predetermined number;
in response to the determining that the number of the first plurality of deployments is not less than the predetermined number, selecting, by the query serving system, a first deployment from the serving pool, to be assigned to the new skillbot, the selecting from the serving pool comprising selecting the first deployment as a deployment having one or more vacant sub-containers including a first sub-container;
in response to the determining that the number of the first plurality of deployments is less than the predetermined number, selecting, by the query serving system, a second deployment from the free pool, to be assigned to the new skillbot;
downloading, respectively, by a first model manager of a first model manager container included in the first deployment or a second model manager of a second model manager container included in the second deployment, the new machine-learning model that is trained to serve the query for the new skillbot;
loading, by the query serving system, the new machine-learning model respectively into the first sub-container or a second sub-container included in the second deployment; and
utilizing, by the query serving system, the new machine-learning model associated with the new skillbot to serve the query initiated by the user by providing, as an input, the user input to the new machine-learning model and obtaining, as an output of the new machine-learning model, a response to the query, wherein the response to the query is provided to the user,
wherein the method further comprises:
identifying, by the query serving system for at least one deployment of the first plurality of deployments, whether the at least one deployment includes an active sub-container that satisfies a certain criterion among the active sub-containers of the at least one deployment; and
deleting, by the query serving system, a machine-learning model loaded in the identified active sub-container, wherein the identified active sub-container thereafter becomes one of the vacant sub-containers.
|