US 12,443,398 B2
Kernel fusion for machine learning
Andrew Kerr, Santa Clara, CA (US); Mike Murphy, Newark, CA (US); Mostafa Hagog, Folsom, CA (US); Julien Demouth, Paris (FR); and John Tran, Denver, CO (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by Nvidia Corporation, Santa Clara, CA (US)
Filed on Oct. 2, 2019, as Appl. No. 16/591,306.
Prior Publication US 2021/0103433 A1, Apr. 8, 2021
Int. Cl. G06F 8/41 (2018.01); G06F 9/455 (2018.01); G06N 20/10 (2019.01)
CPC G06F 8/41 (2013.01) [G06F 9/45525 (2013.01); G06N 20/10 (2019.01)] 28 Claims
OG exemplary drawing
 
1. One or more processors, comprising: processing circuitry to:
compile one or more intermediate representation (IR) host code software programs comprising one or more user-specified functions;
separately compile one or more object code device software programs comprising one or more call sites that correspond to the one or more user-specified functions; and
compile the one or more IR host code software programs together with the one or more object code device software programs to generate a single object code software program.