US 12,462,058 B2
Sensitive-data-aware encoding
Debasis Ganguly, Dublin (IE); Martin Gleize, Dublin (IE); Pierpaolo Tommasi, Dublin (IE); and Yufang Hou, Dublin (IE)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 14, 2021, as Appl. No. 17/320,488.
Prior Publication US 2022/0366074 A1, Nov. 17, 2022
Int. Cl. G06F 21/62 (2013.01); G06F 18/23 (2023.01); G06N 20/20 (2019.01); G06V 10/75 (2022.01)
CPC G06F 21/6245 (2013.01) [G06F 18/23 (2023.01); G06N 20/20 (2019.01); G06V 10/751 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving, by a computing device a data space;
identifying, by one or more learning models and based on the data space, a first set of features that are relevant to sensitive information;
defining a sensitive subspace within the data space, the sensitive subspace including the first set of features;
calculating a confidence score for the first set of features, wherein the confidence score represents a likelihood a sensitive data can be discerned from the first set of features;
determining the first set of features is vulnerable based on the confidence score being above a vulnerability threshold;
identifying, by the one or more learning models and based on the data space, a second set of features that are relevant to a goal task, wherein the goal task is related to an output of a learning model of the one or more learning models of a provider;
defining a goal subspace within the data space, the goal subspace including the second set of features, wherein the goal task is a task completed by a learning model using the goal subspace;
identifying a first subset of features, wherein the first subset of features comprises data included within the sensitive subspace but not included within the goal subspace;
pruning, by the computing device and in response to the identifying the first subset of features, the data space, the pruning including removing the first subset of features from the data space, resulting in a pruned data space, wherein the pruning includes encoding the pruned data space; and
transmitting, by the computing device, the pruned data space to the provider.