CPC G06F 3/0647 (2013.01) [G06F 3/067 (2013.01); G06F 3/0679 (2013.01); G06F 8/65 (2013.01); G06N 20/00 (2019.01); G06F 12/0261 (2013.01); G06F 12/0862 (2013.01); G06Q 10/06316 (2013.01)] | 20 Claims |
1. A method for managing a machine learning model, comprising:
determining a first instance of a current version for the machine learning model and a second instance of an upgraded version for the machine learning model, the first instance executing a service for processing data, wherein the first instance and the second instance are configured to run at least in part concurrently with one another on one or more graphics processing units to provide uninterrupted access to the service for processing data using one of the first instance and the second instance in conjunction with migration of the service from the first instance to the second instance;
adjusting respectively, if determining that the service is to be migrated from the first instance to the second instance, a first allocation policy for storage space of the first instance and a second allocation policy for storage space of the second instance to a first target policy and a second target policy, wherein the first target policy is used to phase out storage space and the second target policy is used to phase in storage space;
reclaiming allocated storage space for the first instance based on the first target policy; and
allocating required storage space for the second instance based on the second target policy to realize migration of the service;
wherein the storage space comprises memory resources of the one or more graphics processing units;
wherein the first target policy is more restrictive with regard to usage of the memory resources of the one or more graphics processing units than the first allocation policy; and
wherein the second target policy is less restrictive with regard to usage of the memory resources of the one or more graphics processing units than the first target policy.
|