US 11,989,327 B2
Autoencoder-based information content preserving data anonymization system
Justin Bercich, Reykjavik (IS); Theresa Bercich, Reykjavik (IS); Gudmundur Runar Kristjansson, Gardabaer (IS); and Anush Vasudevan, Monroe Township, NJ (US)
Assigned to Lucinity ehf, Reykjavik (IS)
Filed by Lucinity ehf, Reykjavik (IS)
Filed on Dec. 8, 2021, as Appl. No. 17/545,819.
Application 17/545,819 is a continuation of application No. 17/020,453, filed on Sep. 14, 2020, granted, now 11,227,067.
Claims priority of provisional application 62/902,505, filed on Sep. 19, 2019.
Prior Publication US 2022/0100901 A1, Mar. 31, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/084 (2023.01); G06F 21/62 (2013.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01)
CPC G06F 21/6254 (2013.01) [G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06N 3/084 (2013.01)] 23 Claims
OG exemplary drawing
 
1. An auto-encoder system for anonymizing data associated with a population of entities, the system comprising:
a computer memory storing specific computer-executable instructions for a neural network, wherein the neural network comprises: an input node; a first layer of nodes for receiving an output from the input node; a second layer of nodes positioned downstream of the first layer of nodes; a third layer of nodes positioned downstream of the second layer of nodes; and an output node for receiving an output from the third layer of nodes to provide an encoded output vector; wherein the second layer of nodes includes a number of nodes that is greater than a number of nodes in the first layer of nodes and is greater than a number of nodes in the third layer of nodes;
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the computer-executable instructions to at least:
obtain data identifying a plurality of characteristics in a human readable text comprising one or more letters or numbers associated with at least a subset of the entities in the population;
prepare a plurality of input vectors that include more than one of the plurality of characteristics, wherein the characteristics appear in the respective input vectors in a human recognizable form; and
train the neural network with the plurality of input vectors, wherein the training comprises a plurality of training cycles wherein the training cycles comprise: inputting one of the input vectors at the input nodes; processing said input vector with the neural network to provide an encoded output vector at the output nodes; determining an output vector reconstruction error by calculating a function of the encoded output vector and the respective input vector; back-propagating the output vector reconstruction error back through the neural network from the output nodes back to the input nodes by a chained derivative of the outputs and weights of the intervening nodes; and recalibrating a weight in one or more of the nodes in the neural network to minimize the output vector reconstruction error;
wherein a plurality of the encoded output vectors during training include at least one of the plurality of characteristics recognizable for comparison by a processor to identify two or more encoded output vectors with a common characteristic but wherein said plurality of the encoded output vectors does not contain said at least one of the plurality of characteristics in a human recognizable form.