US 11,941,520 B2
Hyperparameter determination for a differentially private federated learning process
Colin Sutcher-Shepard, Troy, NY (US); Ashish Verma, Nanuet, NY (US); Jayaram Kallapalayam Radhakrishnan, Pleasantville, NY (US); and Gegi Thomas, Danbury, CT (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jan. 9, 2020, as Appl. No. 16/738,114.
Prior Publication US 2021/0216902 A1, Jul. 15, 2021
Int. Cl. G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) 25 Claims
OG exemplary drawing
 
1. A system, comprising:
a memory that stores computer executable components; and
a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise:
a hyperparameter advisory component that iteratively trains an overall machine learning model of a differentially private federated learning process, wherein the training comprises, at each iteration:
determining respective values of a hyperparameter for machine learning models distributed on computing devices based on respective privacy budgets, respective learning rate schedules, and respective batch sizes associated with the machine learning models, wherein the respective values of the hyperparameter indicate respective amounts of noise to introduce to respective derivatives of the machine learning models from training to achieve respective defined amounts of privacy of respective training data employed for the training of the machine learning models;
transmitting the respective values of the hyperparameter to the computing devices to train the machine learning models and introduce the respective amounts of noise to the respective derivatives of the machine learning models;
receiving the respective derivatives of the machine learning models from the computing devices; and
aggregating the respective derivatives of the machine learning models to update the overall machine learning model, wherein the respective derivatives comprise at least model weights.