US 12,073,258 B2
Configuration map based sharding for containers in a machine learning serving infrastructure
Yuliya L. Feldman, Campbell, CA (US); Seyedshahin Ashrafzadeh, Foster City, CA (US); Alexandr Nikitin, El Sobrante, CA (US); and Manoj Agarwal, Cupertino, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on May 28, 2021, as Appl. No. 17/334,592.
Prior Publication US 2022/0382601 A1, Dec. 1, 2022
Int. Cl. G06F 9/50 (2006.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06F 9/5083 (2013.01) [G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method for managing containers in a machine learning (ML) serving infrastructure, the method comprising:
receiving or detecting an update of container metrics including resource usage and serviced requests per ML model or per container, where a plurality of ML models are hosted by and distributed amongst a plurality of containers;
processing the container metrics per ML model or per container to determine recent resource usage and serviced requests per ML model or per container;
rebalancing the distribution of ML models to containers in response to detecting a load imbalance between containers or detecting a stressed container;
identifying the plurality of containers as available to execute ML models;
updating an expected model assignment for each container in the plurality of containers;
sending the expected model assignment to a container manager to implement loading or unloading of ML models at each container of the plurality of containers;
updating the expected model assignment for each container in the plurality of containers in response to the rebalancing of the distribution of ML models to the plurality of containers; and
sending the updated expected model assignment to the container manager to implement moving of ML models between containers according to the updated expected model assignment.