| CPC G10L 15/26 (2013.01) [G10L 15/30 (2013.01)] | 20 Claims |

|
1. A method comprising:
receiving a plurality of requests for audio transcription;
determining a compute graph for each request, wherein the compute graph comprises one or more artificial intelligence models;
batching the requests, based on the artificial intelligence models in the compute graph of each request;
loading one or more artificial intelligence models corresponding to a batch to a hardware module;
pushing processing of a request in the batch to the hardware module, loaded with the one or more artificial intelligence models, corresponding to the batch; and
offloading, from the hardware module, artificial intelligence models not needed for processing the requests in the batch.
|