US 11,743,143 B2
Service level agreement-based multi-hardware accelerated inference
Francesc Guim Bernat, Barcelona (ES); Kshitij Arun Doshi, Tempe, AZ (US); Suraj Prabhakaran, Aachen (DE); Raghu Kondapalli, San Jose, CA (US); and Alexander Bachmutsky, Sunnyvale, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 6, 2022, as Appl. No. 17/832,903.
Application 17/832,903 is a continuation of application No. 17/066,400, filed on Oct. 8, 2020, granted, now 11,356,339.
Application 17/066,400 is a continuation of application No. 15/857,526, filed on Dec. 28, 2017, granted, now 10,805,179.
Prior Publication US 2022/0407784 A1, Dec. 22, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 12/70 (2013.01); H04L 41/0806 (2022.01); H04W 48/08 (2009.01); H04L 67/12 (2022.01); H04L 41/5019 (2022.01); H04L 41/5041 (2022.01); H04L 67/61 (2022.01); H04L 67/63 (2022.01); G06N 5/04 (2023.01)
CPC H04L 41/5019 (2013.01) [H04L 41/0806 (2013.01); H04L 41/5045 (2013.01); H04L 67/12 (2013.01); H04L 67/61 (2022.05); H04L 67/63 (2022.05); G06N 5/04 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A gateway comprising:
memory; and
processing circuitry coupled to the memory, the processing circuitry to:
receive a request from a requester via a network interface of the gateway, the request including: an inference model identifier that identifies a registered platform of the request and a response time indicator;
transmit the request to the registered platform to handle the request consistent with the response time indicator, the registered platform selected based on a latency of the registered platform and the response time indicator; and
delay a different request having a higher response time indicator than the received request.