US 11,687,528 B2
Systems and methods for discovery, classification, and indexing of data in a native computing system
Haribalan Raghupathy, Seattle, WA (US); Saravanan Pitchaimani, Atlanta, GA (US); Jonathan Lynn, Seattle, WA (US); Rahul Shinde, Seattle, WA (US); Kevin Jones, Atlanta, GA (US); Subramanian Viswanathan, San Ramon, CA (US); Mahesh Sivan, Atlanta, GA (US); Zara Dana, San Francisco, CA (US); Milap Shah, Bengaluru (IN); Sivanandame Chandramohan, Atlanta, GA (US); Abhishek Upadhyay, Bangalore (IN); and Anand Balasubramanian, Bangalore (IN)
Assigned to OneTrust, LLC, Atlanta, GA (US)
Filed by OneTrust, LLC, Atlanta, GA (US)
Filed on Jan. 25, 2022, as Appl. No. 17/584,187.
Claims priority of provisional application 63/141,216, filed on Jan. 25, 2021.
Prior Publication US 2022/0237190 A1, Jul. 28, 2022
Int. Cl. G06F 16/24 (2019.01); G06F 16/2452 (2019.01); G06F 16/22 (2019.01); G06F 40/284 (2020.01); G06F 9/50 (2006.01); G06Q 10/063 (2023.01); G06F 18/21 (2023.01); G06K 9/62 (2022.01)
CPC G06F 16/24524 (2019.01) [G06F 9/5055 (2013.01); G06F 16/22 (2019.01); G06F 18/217 (2023.01); G06F 40/284 (2020.01); G06Q 10/063 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A system comprising:
a non-transitory computer-readable medium storing instructions; and
a processing device communicatively coupled to the non-transitory computer-readable medium,
wherein, the processing device is configured to execute the instructions and thereby perform operations comprising:
deploying, from a first computing system, via a public data network, a client application on a target computing system, the target computing system comprising a plurality of data sources in a private data network;
receiving, from the client application, at the first computing system, target computing system resource data for computing resources available to the target computing system;
causing, by the client application, the target computing system to use the computing resources available on the target computing system to scan the plurality of data sources in the private data network to discover target data stored on the plurality of data sources based on the target computing system resource data;
causing, by the client application, the target computing system to classify each piece of the target data according to a type by:
causing, by the client application, the target computing system to generate a prediction of a type of each piece of the target data using a neural network based classification model and based on at least one of tokenized data corresponding to the target data or labelled data corresponding to the target data; and
identifying each piece of the target data as the type for the target data based on the prediction satisfying a confidence threshold; and
responsive to discovering the target data stored on the plurality of data sources, generating and storing metadata for each of the plurality of data sources, the metadata indicating at least one of the type of the target data, a number of instances of the target data, or a location of the target data on each of the plurality of data sources.