US 12,032,615 B2
System and method for sensitive content analysis prioritization based on file metadata
Shishir Sharma, Mountain View, CA (US); Amrit Jassal, Morgan Hill, CA (US); Sean H. Puttergill, Sunnyvale, CA (US); Willy Lanig Picard, Poznan (PL); and Marcin Artur Zablocki, Poznan (PL)
Assigned to Egnyte, Inc., Mountain View, CA (US)
Filed by Egnyte, Inc., Mountain View, CA (US)
Filed on May 23, 2023, as Appl. No. 18/200,985.
Application 18/200,985 is a continuation of application No. 16/862,482, filed on Apr. 29, 2020, granted, now 11,714,842.
Claims priority of provisional application 62/840,623, filed on Apr. 30, 2019.
Prior Publication US 2023/0401248 A1, Dec. 14, 2023
Int. Cl. G06F 16/35 (2019.01); G06F 16/182 (2019.01)
CPC G06F 16/353 (2019.01) [G06F 16/183 (2019.01)] 20 Claims
OG exemplary drawing
 
1. In a data governance system, a method for determining a likelihood that a file system object contains sensitive content, said method comprising:
obtaining a set of training metadata corresponding to a set of training file system objects, each file system object of said set of training file system objects having a known status indicating that said each file system object either contains sensitive content or does not contain sensitive content;
processing said set of training metadata to extract a set of training features from said set of training metadata, said training features corresponding to a particular subset of said training metadata being indicative of the probability that a corresponding particular training file system object contains sensitive content;
analyzing said set of training features to determine a relationship between said set of training features and said known statuses of said set of training file system objects;
informing a sensitivity estimation algorithm according to said relationship between said set of training features and said known statuses of said set of training file system objects;
obtaining first metadata and second metadata, said first metadata corresponding to a first file system object and said second metadata corresponding to a second file system object;
analyzing said first metadata according to said sensitivity estimation algorithm to generate a first estimate value based at least in part on said first metadata, said first estimate value being indicative of a first likelihood that said first file system object includes sensitive content;
analyzing said second metadata according to said sensitivity estimation algorithm to generate a second estimate value based at least in part on said second metadata, said second estimate value being indicative of a second likelihood that said second file system object includes sensitive content;
prioritizing said first file system object and said second file system object based at least in part on said first estimate value and said second estimate value; and
performing an operation on said first file system object prior to performing said operation on said second file system object based at least in part on results of said prioritizing.