US 12,321,483 B2
	Augmented privacy datasets using semantic based data linking
Stefano Braghin, Dublin (IE); Killian Levacher, Dublin (IE); Christian Pinto, Dublin (IE); and Marco Simioni, Dublin (IE)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 23, 2020, as Appl. No. 17/101,470.
Prior Publication US 2022/0164471 A1, May 26, 2022
Int. Cl. G06F 16/21 (2019.01); G06F 21/62 (2013.01); G06F 40/216 (2020.01); G06F 40/30 (2020.01)

CPC G06F 21/6245 (2013.01) [G06F 16/213 (2019.01); G06F 40/216 (2020.01); G06F 40/30 (2020.01)]

9 Claims

1. A computer-implemented method comprising:

receiving a target dataset comprising a plurality of subsets corresponding to a plurality of entities, with a given subset including information indicative of at least one attribute of a given entity;

for the given subset, determining semantic representations corresponding to the at least one attribute of the given entity;

augmenting the target dataset, using the determined semantic representations of the target dataset as initial parameters, iteratively, until determining that there are no semantic representations of the given entity present in the identified auxiliary datasets corresponding to an attribute previously omitted from the target dataset, including:

identifying auxiliary datasets including information indicative of attributes corresponding to the given entity based, at least in part, on the determined semantic representations of the target dataset,

determining semantic representations of the attributes of the given entity present in the auxiliary datasets,

determining at least one semantic representation of the given entity present in the identified auxiliary datasets corresponds to a previously omitted attribute from the target dataset,

augmenting the target dataset with the determined at least one semantic representation corresponding to the previously omitted attribute, wherein augmenting the target dataset further comprises appending a subset that is added to the target dataset with the previously omitted attribute, and

iteratively identifying additional attributes using the augmented target dataset, wherein the iteratively identifying further comprises using a semantic representation of the previously omitted attribute associated with the subset in a search to identify the additional attributes present in the auxiliary datasets;

generating an identification score for the given entity based, at least in part, on an amount of the additional attributes added to the augmented target dataset, wherein the identification score is incremented based on an accumulation of each one of the additional attributes;

managing data privacy controls for the target dataset based, at least in part, on the identification score; and

redacting at least one attribute corresponding to the given entity from the received target dataset based, at least in part, on the identification score.