US 12,014,248 B2
Machine learning performance and workload management
Siar Sarferaz, Heidelberg (DE)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Jul. 2, 2019, as Appl. No. 16/460,311.
Prior Publication US 2021/0004712 A1, Jan. 7, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 9/54 (2006.01); G06F 30/27 (2020.01); G06N 5/04 (2023.01); H04L 41/16 (2022.01)
CPC G06N 20/00 (2019.01) [G06F 9/54 (2013.01); G06F 30/27 (2020.01); G06N 5/04 (2013.01); H04L 41/16 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method for reducing resource consumption of a database and a machine learning (ML) system, the method implemented by one or more data processors forming part of at least one computing device and comprising:
receiving, during an inference time of the ML system, from a ML application of a database, data comprising a first inference call for a predicted response to the received data, wherein the first inference call is a request to a ML model to generate one or more predictions for which a response is unknown, the database including logic for accessing data stored in the database and being an in-memory database in which all data stored in the in-memory database is kept in main memory separate and distinct from any processor;
generating, by the ML model using the received data, an output comprising the predicted response to the data;
caching, in an inference cache, the output for future inference calls so as to bypass the ML model;
providing, by the ML model, the generated output to the ML application;
receiving a second inference call comprising the data of the first inference call; and
retrieving, from the inference cache, the cached output, wherein the retrieving bypasses the ML model.