US 11,748,077 B2
	Apparatus and method and computer program product for compiling code adapted for secondary offloads in graphics processing unit
HaiChuan Wang, Shanghai (CN); Song Zhao, Shanghai (CN); GuoFang Jiao, Shanghai (CN); ChengPing Luo, Shanghai (CN); and Zhou Hong, Shanghai (CN)
Assigned to SHANGHAI BIREN TECHNOLOGY CO., LTD, Shanghai (CN)
Filed by Shanghai Biren Technology Co., Ltd, Shanghai (CN)
Filed on Jul. 2, 2021, as Appl. No. 17/367,044.
Claims priority of application No. 202011140485.6 (CN), filed on Oct. 22, 2020.
Prior Publication US 2022/0129255 A1, Apr. 28, 2022
Int. Cl. G06F 8/41 (2018.01); G06T 1/60 (2006.01); G06T 15/00 (2011.01)

CPC G06F 8/458 (2013.01) [G06F 8/447 (2013.01); G06T 1/60 (2013.01); G06T 15/005 (2013.01)]

17 Claims

1. A method for compiling code adapted for secondary offloads in a graphics processing unit (GPU), performed by a processing unit, comprising:

reconstructing a plurality of execution codes in a first kernel into a second kernel,

wherein the second kernel comprises an operation table and a plurality of computation codes,

wherein the operation table comprises a plurality of entries,

wherein the computation codes comprise a portion of the execution codes, and a plurality of synchronization hooks, and each synchronization hook comprises information indicating one entry of the operation table,

wherein an order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table,

wherein a reconstruction of the first kernel into the second kernel comprises:

determining whether each execution code in the first kernel is suitable to be executed by the CU in the GPU;

if so, appending a suitable execution code to the computation codes, whereby enabling the suitable execution code to be executed by the CU when the GPU loads and executes the second kernel; and

otherwise, inserting an entry corresponding to an unsuitable execution code into the operation table, and appending a synchronization hook indicating the entry to the computation codes to replace the unsuitable execution code, whereby enabling the engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of the entry in the operation table when the GPU loads and executes the second kernel.