CPC G06F 18/2148 (2023.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)] | 9 Claims |
1. A computer-implemented method for analysing input data on a device using a trained machine learning, ML, model comprising a plurality of neural network layers, the method comprising:
receiving at least one input data item for analysis;
independently selecting a quantisation level for each of the plurality of neural network layers at runtime;
analysing the received input data item using the selected quantisation levels;
storing first configuration data comprising a selection of quantisation levels;
generating a plurality of items of second configuration data from the first configuration data by introducing noise into the first configuration data;
calculating a latency associated with each of the plurality of items of second configuration data; and
selecting an item of second configuration data of the plurality of items of second configuration data having a lowest latency,
wherein the trained ML model comprises a transitional batch-normalisation layer disposed between a first neural network layer and a second neural network layer of the plurality of neural network layers,
wherein the transitional batch-normalisation layer is configured to compensate for a change in feature distribution between a quantisation level of the first neural network layer and a quantisation level of the second neural network layer.
|