US 11,921,686 B1
Systems and methods for performing data analytics on sensitive data in remote network environments without exposing content of the sensitive data
Leena Mary Francis, Pondicherry (IN); Vaibhav Kumar, Monroe, NJ (US); and Ashutosh Pandey, Brandon, FL (US)
Assigned to Citibank, N.A., New York, NY (US)
Filed by Citibank, N.A., New York, NY (US)
Filed on Aug. 24, 2023, as Appl. No. 18/455,353.
Application 18/455,353 is a continuation of application No. 18/299,506, filed on Apr. 12, 2023, granted, now 11,770,521.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01); G06F 16/55 (2019.01); G06F 16/58 (2019.01); G06F 16/583 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/55 (2019.01); G06F 16/5838 (2019.01); G06F 16/5846 (2019.01); G06F 16/5866 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system for performing data analytics on sensitive data in remote network environments without exposing content of the sensitive data, the system comprising:
one or more processors; and
a non-transitory, computer-readable medium storing instructions that, when executed by the one or more processors, cause operations comprising:
receiving a first request to perform a first data analytics operation for a set of sensitive data instances, wherein the first data analytics operation enables deduplication of data instances;
retrieving a first set of image representations for the set of sensitive data instances, wherein the first set of image representations comprises:
a first image representation for a first sensitive data instance for the set of sensitive data instances, wherein the first image representation comprises a first set of alphanumeric characters in the first sensitive data instance mapped to a first set of color coding; and
a second image representation for a second sensitive data instance for the set of sensitive data instances;
clustering the first sensitive data instance and the second sensitive data instance into a first cluster based on similarities between the first image representation and the second image representation;
retrieving a first instance identifier for the first sensitive data instance and a second instance identifier for the second sensitive data instance; and
labeling the first instance identifier and the second instance identifier as corresponding to a duplicate.