US 12,333,392 B2
Data de-identification using semantic equivalence for machine learning
Gandhi Sivakumar, Bentleigh (AU); Lynn Kwok, Bundoora (AU); Kushal S. Patel, Pune (IN); and Sarvesh S. Patel, Pune (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 12, 2021, as Appl. No. 17/318,022.
Prior Publication US 2022/0366294 A1, Nov. 17, 2022
Int. Cl. G06F 3/0482 (2013.01); G06F 16/23 (2019.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 16/2379 (2019.01); G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
detecting a set of personal information data corresponding to a set of users in a set of training data;
transforming the set of training data into a set of semantically equivalent training data by replacing the set of personal information data with a set of semantic equivalent data, wherein the set of semantic equivalent data contains de-identified data with dimension retention;
training a machine learning model using the set of semantically equivalent training data, comprising:
loading metadata mapper objects into an object cache;
loading a user identity and associated access permissions into the metadata mapper objects; and
responsive to a polled thread receiving a response, requesting access to the semantically equivalent training data based on the cached metadata mapper objects; and
responsive to a runtime query from a client device, transforming personal information within the runtime query into semantic equivalencies, wherein the personal information is replaced with a semantic proximate associated with the set of semantic equivalent data and transmitted to the trained machine learning model.