US 12,353,334 B2
System cache optimizations for deep learning compute engines
Neta Zmora, Tzur Moshe (IL); and Eran Ben-Avi, Haifa (IL)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jan. 9, 2024, as Appl. No. 18/407,816.
Application 18/407,816 is a continuation of application No. 18/168,703, filed on Feb. 14, 2023, granted, now 11,914,525.
Application 18/168,703 is a continuation of application No. 17/307,299, filed on May 4, 2021, granted, now 11,586,558, issued on Feb. 21, 2023.
Application 17/307,299 is a continuation of application No. 15/494,922, filed on Apr. 24, 2017, granted, now 11,003,592, issued on May 11, 2021.
Prior Publication US 2025/0028650 A1, Jan. 23, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 12/128 (2016.01); G06F 12/084 (2016.01); G06F 12/0895 (2016.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01)
CPC G06F 12/128 (2013.01) [G06F 12/084 (2013.01); G06F 12/0895 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06F 2212/601 (2013.01); G06F 2212/6042 (2013.01); G06F 2212/6046 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a last level cache (LLC) dynamically divided into private caches each corresponding to compute engines performing concurrent compute operations on different deep learning (DL) layers of a DL neural network;
DL hardware circuitry communicatively coupled to the LLC by an interconnect, wherein the DL hardware circuitry comprising the compute engines to execute the concurrent compute operations on the DL layers using the LLC, wherein each compute engine corresponds to a different one of the DL layers; and
a system cache controller communicably coupled to the DL hardware circuitry and the LLC, the system cache controller to:
receive, from the DL hardware circuitry, a cache access request from a first compute engine of the compute engines performing the concurrent compute operations for a first DL layer of the DL neural network; and
direct the cache access request to a first private cache of the private caches of the LLC, the first private cache corresponding to the first compute engine.