US 12,406,203 B2
Fast and scalable multi-tenant serve pool for chatbots
Vishal Vishnoi, Redwood Shores, CA (US); Suman Mallapura Somasundar, Sunnyvale, CA (US); Xin Xu, San Jose, CA (US); and Stevan Malesevic, Glen Ellyn, IL (US)
Assigned to ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Apr. 13, 2021, as Appl. No. 17/229,224.
Claims priority of provisional application 63/139,723, filed on Jan. 20, 2021.
Claims priority of provisional application 63/009,118, filed on Apr. 13, 2020.
Prior Publication US 2021/0319360 A1, Oct. 14, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 11/34 (2006.01); G06F 16/2458 (2019.01); G06N 5/043 (2023.01); H04L 51/02 (2022.01)
CPC G06N 20/00 (2019.01) [G06F 11/3409 (2013.01); G06F 16/2471 (2019.01); G06N 5/043 (2013.01); H04L 51/02 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, by a query serving system, a request to serve a query for a new skillbot associated with a new machine-learning model, the query being initiated by a user input of a user, wherein the query serving system includes a first plurality of deployments in a serving pool, and a second plurality of deployments in a free pool,
wherein:
each deployment of the first plurality of deployments and the second plurality of deployments is a computing unit and includes a plurality of sub-containers and a model manager container that hosts a model manager,
each sub-container of the plurality of sub-containers is configured to host one of a plurality of machine-learning models downloadable by the model manager, each machine-learning model of the plurality of machine-learning models being associated with a skillbot,
the first plurality of deployments includes (a) active sub-containers, each of the active sub-containers having loaded therein a machine-learning model of the plurality of machine-learning models, the new machine-learning model has not been loaded into any active sub-container, and (b) vacant sub-containers, each of the vacant sub-containers not having loaded therein any machine-learning model of the plurality of machine-learning models, and
the second plurality of deployments includes the vacant sub-containers and no active sub-container;
determining whether a number of the first plurality of deployments is less than a predetermined number;
in response to the determining that the number of the first plurality of deployments is not less than the predetermined number, selecting, by the query serving system, a first deployment from the serving pool, to be assigned to the new skillbot, the selecting from the serving pool comprising selecting the first deployment as a deployment having one or more vacant sub-containers including a first sub-container;
in response to the determining that the number of the first plurality of deployments is less than the predetermined number, selecting, by the query serving system, a second deployment from the free pool, to be assigned to the new skillbot;
downloading, respectively, by a first model manager of a first model manager container included in the first deployment or a second model manager of a second model manager container included in the second deployment, the new machine-learning model that is trained to serve the query for the new skillbot;
loading, by the query serving system, the new machine-learning model respectively into the first sub-container or a second sub-container included in the second deployment; and
utilizing, by the query serving system, the new machine-learning model associated with the new skillbot to serve the query initiated by the user by providing, as an input, the user input to the new machine-learning model and obtaining, as an output of the new machine-learning model, a response to the query, wherein the response to the query is provided to the user,
wherein the method further comprises:
identifying, by the query serving system for at least one deployment of the first plurality of deployments, whether the at least one deployment includes an active sub-container that satisfies a certain criterion among the active sub-containers of the at least one deployment; and
deleting, by the query serving system, a machine-learning model loaded in the identified active sub-container, wherein the identified active sub-container thereafter becomes one of the vacant sub-containers.