US 12,141,314 B2
Secret detection on computing platform
Phillip Marvin Tischler, Mineola, NY (US); Seth Joseph Vargo, Pittsburgh, PA (US); Timothy Dylan Peacock, San Francisco, CA (US); Colin Man, New York, NY (US); and Scott Tyler Ellis, San Carlos, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 31, 2021, as Appl. No. 17/462,939.
Prior Publication US 2023/0063214 A1, Mar. 2, 2023
Int. Cl. G06F 21/62 (2013.01); G06F 16/22 (2019.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01); G06F 16/93 (2019.01)
CPC G06F 21/6227 (2013.01) [G06F 16/2255 (2019.01); G06F 16/24568 (2019.01); G06F 16/285 (2019.01); G06F 16/93 (2019.01)] 15 Claims
OG exemplary drawing
 
1. A computing platform comprising:
one or more hardware processors configured to control operations to:
receive a data stream of one or more digital documents;
retrieve a collection of secret data in a repository, the secret data comprising a plurality of secrets, wherein each secret in the plurality of secrets is associated with one or more entities in communication with one or more computing systems;
detect, based on a comparison of the data stream with the secret data, one or more secrets in the data stream; and
in response to the detection, send an indication of the presence of the detected secrets to the one or more entities associated with the detected secrets,
wherein the data stream and the plurality of secrets are encoded as strings;
wherein in detecting the presence of the one or more secrets, the one or more hardware processors are further configured to control operations to:
index the plurality of secrets in the repository,
identify one or more potential secrets in the data stream based on predetermined false-positive probabilities that the one or more potential secrets are in the repository, and
identify the one or more secrets from the one or more potential secrets in the data stream;
wherein in indexing the secret data, the one or more hardware processors are further configured to control an operation to generate one or more filters, the one or more filters indicating the existence or absence of secrets in the secret data with non-zero probability; and
wherein in detecting the presence of the one or more secrets in the data stream, the one or more hardware processors are configured to control operations to:
generate hash values for each substring of a minimum predetermined length in the data stream;
query the hash values through the one or more filters to identify potential secrets in the repository with non-zero probability to generate a filtered data stream; and
process the potential secrets through one or more search data structures comprising secrets in the repository to detect the presence of the one or more secrets in the data stream.