CPC G06N 3/08 (2013.01) [G06F 16/2219 (2019.01); G06N 3/04 (2013.01)] | 20 Claims |
11. A data processing system, comprising:
one or more processors; and
a memory coupled to the one or more processors to store instructions, which when executed by the one or more processors, cause the one or more processors to perform operations, the operations including
receiving a trained artificial intelligence (AI) model having one or more layers;
receiving first input data for offline inferencing;
applying offline inferencing to the trained AI model based on the first input data to generate offline data distributions for the trained AI model;
identifying outlier points in the offline data distributions;
removing a predetermined number of the identified outlier points to generate an updated offline data distributions; and
quantizing one or more tensors of the trained AI model based on the updated offline data distributions to generate a low-bit representation AI model, wherein each layer of the AI model includes the one or more tensors, wherein the one or more tensors include activation, weights, or bias tensors.
|