US 12,443,399 B1
Method and system for code optimization based on statistical data
Ulf Hanebutte, Gig Harbor, WA (US); Senad Durakovic, Palo Alto, CA (US); Harri Hakkarainen, Los Gatos, CA (US); Chien-Chun Chou, Morgan Hill, CA (US); Veena Karthikeyan, Mountain View, CA (US); and Fu-Hwa Wang, Saratoga, CA (US)
Assigned to Marvell Asia Pte Ltd, Singapore (SG)
Filed by Marvell Asia Pte Ltd, Singapore (SG)
Filed on Mar. 7, 2023, as Appl. No. 18/118,325.
Claims priority of provisional application 63/317,110, filed on Mar. 7, 2022.
Int. Cl. G06F 8/41 (2018.01); G06F 11/34 (2006.01)
CPC G06F 8/443 (2013.01) [G06F 8/48 (2013.01); G06F 11/3452 (2013.01); G06F 11/3457 (2013.01)] 34 Claims
OG exemplary drawing
 
1. A compiler implemented method, comprising:
receiving a high-level function in a first high-level code;
compiling the high-level function into a first set of low-level instructions to be executed on a hardware or a simulator;
generating at least one meta data during the compiling, wherein the at least one meta data is generated based on a strategy generated by the compiler, wherein the at least one meta data includes information associated with a layer of a machine learning (ML) model being executed on the hardware or the simulator;
transmitting the first set of low-level instructions to the hardware or the simulator;
receiving a plurality of statistical data generated by the hardware or the simulator in response to execution of the first set of low-level instructions;
determining whether to make changes to the compilation associated with the high-level function in the first high-level code based on the plurality of statistical data, wherein the changes to the compilation includes at least one or more of changing a memory layout, replacing one ML library call with another ML library call, modifying a mapping of data to memory blocks, modifying a precision for an instruction, modifying quantization for the instruction, modifying a processing element to perform a particular operation associated with the instruction, reordering of data dimensions, and a change to methodology to split tensors;
recompiling the high-level function into a second set of low-level instructions to be executed on the hardware or the simulator based on the changes to the compilation; and
transmitting the second set of low-level instructions to the hardware or the simulator.