US 11,785,022 B2
Building a Machine Learning model without compromising data privacy
Changsha Ma, Campbell, CA (US); Dianhuan Lin, Sunnyvale, CA (US); Rex Shang, Los Altos, CA (US); Douglas A. Koch, Santa Clara, CA (US); Dong Guo, San Jose, CA (US); and Howie Xu, Palo Alto, CA (US)
Assigned to Zscaler, Inc., San Jose, CA (US)
Filed by Zscaler, Inc., San Jose, CA (US)
Filed on Jun. 16, 2020, as Appl. No. 16/902,707.
Prior Publication US 2021/0392147 A1, Dec. 16, 2021
Int. Cl. H04L 9/40 (2022.01); G06N 20/00 (2019.01)
CPC H04L 63/1416 (2013.01) [G06N 20/00 (2019.01); H04L 63/145 (2013.01); H04L 63/1425 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable storage medium having computer-readable code stored thereon for programming one or more processors to perform steps of:
obtaining file identifiers associated with files in production data, wherein the production data is live, real traffic that includes both benign and malicious data;
obtaining lab data from one or more public repositories of malware samples based on similarities of the file identifiers obtained from the live, real production data and the lab data;
deriving weights based on similarities of the production data and the obtained lab data; and
utilizing the derived weights and the obtained lab data as training data for training a machine learning process for classifying malware in the production data, wherein the training includes utilizing the derived weights for improving a recall of the machine learning process.