US 11,704,564 B2
	Real time context dependent deep learning
Lev Faivishevsky, Kfar Saba (IL); Tomer Bar-On, Petah Tikva (IL); Yaniv Fais, Tel Aviv (IL); Jacob Subag, Kiryat Haim (IL); Jeremie Dreyfuss, Tel Aviv (IL); Amit Bleiweiss, Yad Binyamin (IL); Tomer Schwartz, Even Yehuda (IL); Raanan Yonatan Yehezkel Rohekar, Kiryat Ekron (IL); Michael Behar, Zichron Yaakov (IL); Amitai Armon, Tel-Aviv (IL); and Uzi Sarel, Zichron-Yaakov (IL)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 17, 2021, as Appl. No. 17/404,153.
Application 17/404,153 is a continuation of application No. 15/494,887, filed on Apr. 24, 2017, granted, now 11,238,338.
Prior Publication US 2022/0076118 A1, Mar. 10, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01)

CPC G06N 3/08 (2013.01) [G06N 20/00 (2019.01); G06N 20/10 (2019.01)]

20 Claims

1. An apparatus comprising:

a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs), the GPGPU to:

receive a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs;

perform measurements of compute power and latency of the plurality of SMs;

determine a ratio of the compute power for the plurality of SMs; and

assign, in accordance with the ratio of the compute power and in accordance with the latency, the training data in a low precision form and the weight inputs in a high precision form among the plurality of SMs, wherein the low precision form is lower than the high precision form.