US 11,782,965 B1
Systems and methods for normalizing data store classification information
Shailesh A. Dargude, San Jose, CA (US); Satish Grandhi, Santa Clara, CA (US); and Joshua V. Stageberg, San Francisco, CA (US)
Assigned to Veritas Technologies LLC, Santa Clara, CA (US)
Filed by Veritas Technologies LLC, Mountain View, CA (US)
Filed on Aug. 29, 2018, as Appl. No. 16/116,522.
Claims priority of provisional application 62/653,541, filed on Apr. 5, 2018.
Int. Cl. G06F 16/35 (2019.01); G06N 20/00 (2019.01); G06F 21/57 (2013.01); G06F 16/33 (2019.01); G06F 18/214 (2023.01)
CPC G06F 16/35 (2019.01) [G06F 16/3347 (2019.01); G06F 18/214 (2023.01); G06F 21/577 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
13. A system for normalizing data store classification information, the system comprising:
a first receiving module, stored in a memory, that receives, at the system, classification information from multiple different data store content classification sources;
a training module, stored in the memory, that trains a continuous bag of words (CBOW) classification model with the classification information;
a second receiving module, stored in the memory, that receives a classification tag from a data store, wherein the classification tag indicates a classification of respectively stored data that is classified by one of the data store content classification sources, wherein the one of the data store content classification sources comprises a set including a plurality of different sensitivity levels;
a classifying module, stored in the memory, that automatically classifies, with the trained CBOW classification model, the received classification tag to a corresponding command tag in a uniform tag set, wherein the command tag represents a meaning of the classification tag, wherein the command tag further indicates a presence or absence of personally identifiable information (PII), and wherein the classification tag is normalized to the corresponding command tag as the presence of the PII based on a first subset of the plurality of different sensitivity levels and the absence of the PII based on a second subset of the plurality of information sensitivity levels, wherein the classifying module further:
identifies, in response to classifying the received classification tag to the corresponding command tag, a potential security risk; and
performs, in response to identifying the potential security risk, a security action in an attempt to ameliorate the potential security risk, wherein the security action comprises limiting access to the data store based on each of the plurality of different sensitivity levels; and
at least one physical processor that executes the first receiving module, the training module, the second receiving module, and the classifying module.