CPC G06F 21/6254 (2013.01) [G06F 16/24558 (2019.01); G06F 21/604 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A system for automatically masking sensitive data and detecting and avoiding false positives, the system comprising:
one or more memories; and
one or more processors, coupled to the one or more memories, configured to:
receive a set of data intended for inclusion in a data store;
detect, within the set of data and using a detection model, potential sensitive data fields, wherein the detection model is configured using at least one of:
data type matching that identifies a correspondence between one or more stored data types and one or more data types associated with the potential sensitive data fields, or
pattern matching that identifies a correspondence between one or more stored data patterns and one or more data patterns included in the potential sensitive data fields;
identify, within the set of data, a set of characters within a threshold quantity of characters of a median character included in at least one potential sensitive data field of the potential sensitive data fields;
detect, within the potential sensitive data fields and using a false positive model, at least one non-sensitive data field by identifying within the set of characters at least one of one or more patterns or one or more words or phrases that are within the threshold quantity of characters of the median character, wherein the false positive model is configured using contextual analysis that determines whether one or more stored characters, phrases, or data patterns correspond to one or more characters, phrases, or data patterns included in the set of data, and wherein the contextual analysis is applied according to the set of characters;
modify the set of data to mask the potential sensitive data fields other than the at least one non-sensitive data field; and
output the modified set of data to the data store.
|