US 12,367,382 B2
	Training with adaptive runtime and precision profiling
Brian T. Lewis, Palo Alto, CA (US); Rajkishore Barik, Santa Clara, CA (US); Murali Sundaresan, Sunnyvale, CA (US); Leonard Truong, Santa Clara, CA (US); Feng Chen, Shanghai (CN); Xiaoming Chen, Shanghai (CN); and Mike B. Macpherson, Portland, OR (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 18, 2021, as Appl. No. 17/323,694.
Application 17/323,694 is a continuation of application No. 15/581,031, filed on Apr. 28, 2017, granted, now 11,017,291.
Prior Publication US 2021/0350215 A1, Nov. 11, 2021
Int. Cl. G06N 3/063 (2023.01); G06F 7/483 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/063 (2013.01) [G06F 7/483 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01)]

23 Claims

1. An apparatus comprising: a graphics hardware processor to:

detect one or more inputs for a first training of a neural network; track, using a hardware counter, precision of floating-point (FP) values during the first training of the neural network by the hardware counter observing exponents of one or more FP registers of hardware to determine a range of values of the exponents stored in the one or more FP registers, wherein the hardware counter is at least one of continuously or periodically reset;

store, in a precision tracking table based on the tracked precision of FP values, a number of bits that are used to represent single-precision FP value used with each layer of the neural network during the first training of the neural network;

provide automated mixed precision for the neural network by enabling selection of a precision for each layer based on the number of bits stored in the precision tracking table, wherein the precisions selected for each layer can differ from other layers; and

perform a second training of the neural network utilizing the precision selected for each layer of the neural network.