US 12,277,492 B2
Kernel-level load balancing across neural engines
Sundararaman Hariharasubramanian, San Jose, CA (US); Xiaozhong Yao, Cupertino, CA (US); and Andrew Yanowitz, Ben Lomand, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Oct. 19, 2021, as Appl. No. 17/505,412.
Claims priority of provisional application 63/197,441, filed on Jun. 6, 2021.
Prior Publication US 2022/0391677 A1, Dec. 8, 2022
Int. Cl. G06N 3/063 (2023.01); G06F 9/50 (2006.01); G06F 9/54 (2006.01); G06N 3/04 (2023.01); G06N 3/06 (2006.01); G06N 5/04 (2023.01)
CPC G06N 3/063 (2013.01) [G06F 9/5083 (2013.01); G06F 9/545 (2013.01); G06N 5/04 (2013.01); G06F 9/5016 (2013.01); G06N 3/04 (2013.01); G06N 3/06 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising performing on an electronic device comprising circuit engines configured to evaluate a particular type of machine learning model:
receiving, at a first system routine from a first client application, a provisioning request indicating that the first client application includes first code for evaluating the particular type of machine learning model, wherein the first system routine executes in user space of memory on the electronic device;
provisioning the first code for execution on one or more of the circuit engines;
receiving, at a second system routine, an inference request from the first client application for evaluating the particular type of machine learning model, the inference request including first input data upon which the particular type of machine learning model is evaluated, wherein the second system routine executes in kernel space of memory on the electronic device;
receiving, at the second system routine, information about a current status and a historical performance of the circuit engines;
assigning, by the second system routine, the inference request to one or more of the circuit engines based on the information;
evaluating, using the one or more of the circuit engines, the inference request; and
providing a result of the inference request to the first client application.