CPC G06F 21/604 (2013.01) [G06F 16/2255 (2019.01); G06F 16/2458 (2019.01); G06F 16/29 (2019.01); H04L 9/0643 (2013.01)] | 20 Claims |
1. A method comprising:
generating a first set of one or more tokens from a data-in-motion object; and
based at least partly on the first set of tokens, determining whether the data-in-motion object violates a data leakage prevention rule for a dataset, wherein determining whether the data-in-motion object violates the data leakage prevention rule for the dataset comprises,
querying each of a plurality of minimal perfect hashing functions with each of the first set of tokens, wherein each of the plurality of minimal perfect hashing functions corresponds to a different class of token frequency within the dataset, wherein a first of the plurality of minimal perfect hashing functions was created from a key set of unique tokens within the dataset and a second of the plurality of minimal perfect hashing functions was created from a key set of tokens that occur infrequently within the dataset according to a defined frequency criterion;
determining one or more data field indexes and one or more record indicators of the dataset for those of the first set of tokens that hit in at least one of the first and second minimal perfect hashing functions; and
determining whether the data field indexes complete a data field pattern specified by the data leakage prevention rule for at least one record of the dataset indicated by one of the record indicators.
|