US 11,783,031 B1
Systems and methods for utilizing federated machine-learning to protect against potentially malicious data
Yufei Han, Antibes (FR); Lella Bilge, Antibes (FR); and Chris Gates, Mountain View, CA (US)
Assigned to GEN DIGITAL INC., Tempe, AZ (US)
Filed by GEN DIGITAL INC., Tempe, AZ (US)
Filed on Mar. 31, 2020, as Appl. No. 16/836,791.
Int. Cl. G06F 21/55 (2013.01); G06N 3/08 (2023.01)
CPC G06F 21/554 (2013.01) [G06N 3/08 (2013.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for utilizing federated machine-learning to protect against potentially malicious data, at least a portion of the method being performed by one or more computing devices comprising at least one processor, the method comprising:
arranging, by the one or more computing devices, a plurality of client devices into groups for applying a federated machine-learning model;
determining, by the one or more computing devices, model updates for each of the groups over a predetermined period;
training, by the one or more computing devices, one or more recurrent neural networks to derive a low-dimensional representation of the model updates;
calculating, by the one or more computing devices, a data quality score for each of the client devices based on the model updates, wherein the model updates comprise a plurality of sequential model updates over a plurality of consecutive time periods, wherein the sequential model updates are utilized as reference model update paths for determining device level quality assessments for the each of the client devices, the data quality score comprising a measurement indicating a presence of noisy data associated with the client devices when a change in a set of the reference model update paths in at least one group of the sequential model updates differs from another set of the reference model update paths in at least one other group of the sequential model updates, the at least one other group of the sequential model updates representing normal sequential updates corresponding to noise-free data and corrupted data having a marginal impact;
applying, by the one or more computing devices, the federated machine-learning model to classify data instances on each of the client devices as comprising at least one of clean data and potentially corrupt data, wherein applying the federated machine-learning model to classify data instances comprises applying the federated machine-learning model to determine a classification margin value for each of the data instances with respect to a classification boundary, wherein each of a plurality of the data instances that are farthest away from either of a left side and a right side of the classification boundary are identified as clean data instances and wherein each of a plurality of the data instances that are closest to either of the left side and the right side of the classification boundary are identified as potentially corrupt data instances; and
performing, by the one or more computing devices, a security action that protects against the potentially malicious data by tagging the data instances classified as the potentially corrupt data.