| CPC G06F 21/6254 (2013.01) [G06F 18/2415 (2023.01)] | 4 Claims |

|
1. A system comprising:
a memory that storing a collection of personal information data and a data catalog of the collection of personal information data; and
a processing apparatus configured to execute:
acquiring designation of metadata in the data catalog;
acquiring a first data range determining a part of the collection of personal information data;
acquiring a machine learning logic;
generating a machine learning model according to the machine learning logic, based on personal information data corresponding to designated metadata and the first data range;
calculating a personal identification risk which shows a risk of a person being identified based on an output of the machine learning model; and
outputting the machine learning model when the personal identification risk does not exceed a predetermined threshold;
wherein
each of the collection of personal information data is corresponding to one or more pieces of ID information each of which shows a particular individual, and
the calculating the personal identification risk includes:
selecting input data from the collection of personal information data;
acquiring output data of the machine learning model by inputting the input data;
generating correspondence information which shows correspondence between the one or more pieces of ID information corresponded to the input data and the output data for the input data; and
calculating the personal identification risk based on the correspondence information;
the calculating the personal identification risk further includes:
classifying the output data into categories;
calculating, for each of combinations of the one or more pieces of ID information and the categories, number of the output data which falls into a category specified by a combination and corresponds in the correspondence information to ID information specified by the combination; and
calculating the personal identification risk according to the following formula:
![]() where:
IR is the personal identification risk, (i, j) is one of the combinations, and u (i, j) is the number of the output data calculated regarding (i, j).
|