US 11,816,549 B2
	Asynchronous gradient weight compression
Wei Zhang, Elmsford, NY (US); and Chia-Yu Chen, Westchester, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 29, 2018, as Appl. No. 16/204,770.
Prior Publication US 2020/0175422 A1, Jun. 4, 2020
Int. Cl. G06N 3/098 (2023.01); G06N 20/20 (2019.01); G06N 7/08 (2006.01)

CPC G06N 20/20 (2019.01) [G06N 3/098 (2023.01); G06N 7/08 (2013.01)]

19 Claims

1. A system, comprising:

a memory that stores computer executable components; and

a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise:

a pointer component that identifies compressed gradient weights, from a first group of second learning entities of a distributed machine learning system, not present in a first concatenated compressed gradient weight vector that was previously sent to a first learning entity of the distributed machine learning system, wherein the first concatenated compressed gradient weight vector is based on other compressed gradient weights from a second group of second learning entities of the distributed machine learning system, and wherein the first group of second learning entities is less than an entirety of second learning entities of the distributed machine learning system and is different from the second group of second learning entities;

a compression component that generates a second concatenated compressed gradient weight vector for the first learning entity, based on the compressed gradient weights and not any of the other compressed gradient weights employed to generate the first concatenated compressed gradient weight vector, to update a weight of the first learning entity, wherein the second concatenated compressed gradient weight vector comprises a concatenation of the compressed gradient weights; and

a transmit component that transmits, via a network, to the first learning entity:

a first transmission comprising information identifying respective vector lengths of the compressed gradient weights in the second concatenated compressed gradient weight vector, wherein the second concatenated compressed gradient weight vector comprises a plurality of compressed gradient weights, at least two of the respective vector lengths are different, and the first learning entity employs the respective vector lengths to identify the compressed gradient weights in the second concatenated compressed gradient weight vector, and

a second transmission comprising the second concatenated compressed gradient weight vector to initiate the first learning entity to update the weight of the first learning entity using the second concatenated compressed gradient weight vector.