US 12,333,394 B2
	Privacy-preserving labeling and classification of email
Yi Luo, Redmond, WA (US); Weigsheng Li, Redmond, WA (US); Sharada Shirish Acharya, Seattle, WA (US); Mainak Sen, Palo Alto, CA (US); Ravi Kiran Reddy Poluri, Redmond, WA (US); and Christian Rudnick, Seattle, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Oct. 31, 2022, as Appl. No. 18/051,217.
Application 18/051,217 is a division of application No. 16/049,579, filed on Jul. 30, 2018, granted, now 11,521,108.
Prior Publication US 2023/0077990 A1, Mar. 16, 2023
Int. Cl. G06N 20/00 (2019.01); H04L 51/42 (2022.01)

CPC G06N 20/00 (2019.01) [H04L 51/42 (2022.05)]

19 Claims

1. A method of labeling email, the method comprising:

receiving an unlabeled email;

identifying a feature and a second feature of the unlabeled email, wherein the feature and the second feature do not include personally identifiable information (“PII”) in a body of the unlabeled email;

receiving a labeled cluster comprising an email-category label and seed data;

assigning the email-category label to the unlabeled email based on a first derivative edge in an expansion graph thereby creating a labeled email, wherein the first derivative edge is a directional edge from the labeled cluster to the feature and the first derivative edge represents first inference logic that the email-category label associated with the labeled cluster is also associated with the unlabeled email;

including the labeled email in a training dataset;

assigning the email-category label to the second feature based on a clustering edge in the expansion graph, wherein the clustering edge is a directional edge from the feature to the second feature and the clustering edge represents second inference logic that a label associated with the feature is also associated with the second feature;

assigning the email-category label to a second unlabeled email based on a second derivative edge in the expansion graph thereby creating a second labeled email, wherein the second derivative edge is a directional edge from the second feature to the second unlabeled email and the second derivative edge represents third inference logic that the email-category label associated with the second feature is also associated with the second unlabeled email;

including the second labeled email in the training dataset;

training a machine learning model to classify email with the training dataset; and

classifying a received email with the machine learning model as spam; and

quarantining the received email on a server without downloading to a local computer.