CPC G06F 21/6245 (2013.01) [G06F 21/6227 (2013.01); H04L 41/0627 (2013.01); H04L 41/22 (2013.01); H04L 67/02 (2013.01)] | 19 Claims |
1. A method, in a data processing system, for identifying sensitive data risks in cloud-based deployments, the method comprising:
building a knowledge graph based on data schema information for a cloud-based computing environment, a set of parsed infrastructure logs, and a set of captured application queries;
identifying a set of sensitive flows in the knowledge graph representing paths from a sensitive data element to an endpoint in the knowledge graph;
scoring the set of sensitive flows based on a scoring algorithm, wherein the scoring algorithm determines, for each sensitive flow, a score along a centrality dimension at least by generating, for each vertex in the set of sensitive flows, a ranking score based on a propagation of a rank value from one vertex to another connected vertex in the set of sensitive flows; and
issuing an alert to an administrator in response to a score of a sensitive flow within the set of sensitive flows exceeding a threshold, wherein the scoring algorithm further determines, for each sensitive flow, a score along the centrality dimension at least by, for vertices that do not have outgoing edges, performing a teleportation operation that teleports propagation of the rank value to a randomly selected vertex using a damping factor to model a probability that data read from one data element is not propagated further, wherein the teleportation operation is limited to vertices in the set of sensitive flows, and wherein vertices with a higher concentration of sensitive data have a higher relative ranking score.
|