CPC G06F 16/215 (2019.01) [G06F 16/55 (2019.01); G06F 16/5838 (2019.01); G06F 16/5846 (2019.01); G06F 16/5866 (2019.01)] | 20 Claims |
1. A system for performing data analytics on sensitive data in remote network environments without exposing content of the sensitive data, the system comprising:
one or more processors; and
a non-transitory, computer-readable medium storing instructions that, when executed by the one or more processors, cause operations comprising:
receiving a first request to perform a first data analytics operation for a set of sensitive data instances, wherein the first data analytics operation enables deduplication of data instances;
retrieving a first set of image representations for the set of sensitive data instances, wherein the first set of image representations comprises:
a first image representation for a first sensitive data instance for the set of sensitive data instances, wherein the first image representation comprises a first set of alphanumeric characters in the first sensitive data instance mapped to a first set of color coding; and
a second image representation for a second sensitive data instance for the set of sensitive data instances;
clustering the first sensitive data instance and the second sensitive data instance into a first cluster based on similarities between the first image representation and the second image representation;
retrieving a first instance identifier for the first sensitive data instance and a second instance identifier for the second sensitive data instance; and
labeling the first instance identifier and the second instance identifier as corresponding to a duplicate.
|