US 12,271,730 B2
Compute platform for machine learning model roll-out
Sudhindra Murthy, Bangalore (IN); Divakar Viswanathan, Tiruchirappalli (IN); and Vishal Sood, Bangalore (IN)
Assigned to PAYPAL, INC., San Jose, CA (US)
Filed by PayPal, Inc., San Jose, CA (US)
Filed on Dec. 4, 2023, as Appl. No. 18/527,982.
Application 18/527,982 is a continuation of application No. 17/398,868, filed on Aug. 10, 2021, granted, now 11,868,756.
Prior Publication US 2024/0168750 A1, May 23, 2024
Int. Cl. G06F 11/14 (2006.01); G06F 8/65 (2018.01); G06F 11/3668 (2025.01); G06N 20/20 (2019.01)
CPC G06F 8/65 (2013.01) [G06F 11/1469 (2013.01); G06F 11/3688 (2013.01); G06F 11/3692 (2013.01); G06N 20/20 (2019.01); G06F 2201/82 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A service provider system comprising:
a non-transitory memory; and
one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the service provider system to perform operations comprising:
receiving a machine learning (ML) model package for a deployment in a production computing environment, wherein the ML model package includes an ML model having a plurality of compute items and a non-performant compute item flag of a first one of the plurality of compute items;
identifying the first one of the plurality of compute items in an execution graph of the ML model;
skipping the first one of the plurality of compute items in the execution graph based on the non-performant compute item flag;
deploying, with a corresponding ML engine in the production computing environment, the ML model with the first one of the plurality of compute items skipped in the execution graph of the ML model, wherein the execution graph is accessible from a shared volume for the production computing environment;
receiving an execution request of the ML model by the corresponding ML engine for an output;
determining, from the shared volume, the execution graph having the first one of the plurality of compute items skipped; and
executing the ML model using the ML engine in the production computing environment based on the execution graph, wherein the executing includes parsing the execution graph for the plurality of compute items and causing the first one of the plurality of compute items to be skipped by the ML engine.