CPC G06F 1/3296 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)] | 20 Claims |
1. A method for performing multimode inferencing, comprising:
receiving machine learning model input data from a requestor;
processing the machine learning model input data with a machine learning model using processing hardware at a first power level to generate first output data, wherein the processing hardware comprises a processor operable at one of a plurality of power levels including the first power level and a second power level different from the first power level, each power level corresponding to a different performance level for the machine learning model;
enabling data quantization by a pre-processor after selecting the first power level;
selecting the second power level for the processing hardware based on comparing the first output data to a threshold value;
disabling data quantization by the pre-processor after selecting the second power level;
processing the machine learning model input data with the machine learning model using the processing hardware at the second power level to generate second output data; and
sending second output data to the requestor.
|