US 11,886,983 B2
	Reducing hardware resource utilization for residual neural networks
Andy Wagner, Cupertino, CA (US); Tiyasa Mitra, San Jose, CA (US); and Marc Tremblay, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 25, 2020, as Appl. No. 17/002,478.
Prior Publication US 2022/0067490 A1, Mar. 3, 2022
Int. Cl. G06N 3/063 (2023.01); G06F 17/16 (2006.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/063 (2013.01) [G06F 17/16 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01)]

20 Claims

1. A system comprising:

a set of processing units; and

a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to:

receive, at a layer included in a neural network, a first matrix;

compress the first matrix to produce a second matrix to reduce an amount of hardware resources utilized to process the second matrix, the second matrix having a reduced dimensionality relative to a dimensionality of the first matrix;

process the second matrix through a network block in the layer included in the neural network;

expand the processed second matrix to produce a third matrix, the third matrix having a dimensionality that is equal to a dimensionality of the first matrix; and

add the third matrix to the first matrix to produce a fourth matrix.