US 12,456,073 B2
	Adaptable dataset distillation for heterogeneous environments
Paulo Abelha Ferreira, Rio de Janeiro (BR); and Vinicius Michel Gottin, Rio de Janeiro (BR)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Oct. 20, 2021, as Appl. No. 17/451,608.
Prior Publication US 2023/0122139 A1, Apr. 20, 2023
Int. Cl. G06N 20/00 (2019.01)

CPC G06N 20/00 (2019.01)

18 Claims

1. A method, comprising:

mapping a set of environment constraints to a dataset distillation process; and

performing the dataset distillation process based upon the mapping in a distributed manner by a group of edge nodes and a central node with which the edge nodes communicate,

wherein the dataset distillation process comprises operations including:

transmitting, from the central node to each of the edge nodes, a current set of distilled data;

optimizing, at each of the edge nodes, a machine learning model, using the current set of distilled data, to define an optimized machine learning model;

receiving, by the central node from each of the edge nodes, a respective updated distillation loss, where the updated distillation loss was generated by a loss evaluation process performed at the edge node with respect to original data from which the distilled data was obtained, and with respect to the optimized machine learning model;

receiving, by the central node from each of the edge nodes, a respective updated learning rate that was obtained by the edge node using a gradient computation with respect to the distilled data and a current learning rate of the optimized machine learning model;

aggregating, by the central node, the updated distillation losses and the updated learning rates received from the edge nodes;

distilling, by the central node, the current set of distilled data into a new set of distilled data based on the aggregated distillation losses and the aggregated learning rates, wherein the new set of distilled data is not transmitted from each of the edge nodes to the central node;

adding a new edge node to the group of edge nodes;

transmitting, from the central node to the new edge node, the new set of distilled data; and

optimizing, at the new edge node, a machine learning model, using the new set of distilled data.