US 12,259,882 B2
Systems and methods for discovery, classification, and indexing of data in a native computing system
Haribalan Raghupathy, Seattle, WA (US); Saravanan Pitchaimani, Atlanta, GA (US); Jonathan Lynn, Seattle, WA (US); Rahul Shinde, Seattle, WA (US); Kevin Jones, Atlanta, GA (US); Subramanian Viswanathan, San Ramon, CA (US); Mahesh Sivan, Atlanta, GA (US); Zara Dana, San Francisco, CA (US); Milap Shah, Bengaluru (IN); Sivanandame Chandramohan, Atlanta, GA (US); Abhishek Upadhyay, Bangalore (IN); and Anand Balasubramanian, Bangalore (IN)
Assigned to OneTrust, LLC, Atlanta, GA (US)
Filed by OneTrust, LLC, Atlanta, GA (US)
Filed on May 4, 2023, as Appl. No. 18/312,498.
Application 18/312,498 is a continuation of application No. 17/584,187, filed on Jan. 25, 2022, granted, now 11,687,528.
Claims priority of provisional application 63/141,216, filed on Jan. 25, 2021.
Prior Publication US 2023/0273921 A1, Aug. 31, 2023
Int. Cl. G06F 16/24 (2019.01); G06F 9/50 (2006.01); G06F 16/22 (2019.01); G06F 16/2452 (2019.01); G06F 18/21 (2023.01); G06F 40/284 (2020.01); G06Q 10/063 (2023.01); G06F 16/95 (2019.01)
CPC G06F 16/24524 (2019.01) [G06F 9/5055 (2013.01); G06F 16/22 (2019.01); G06F 18/217 (2023.01); G06F 40/284 (2020.01); G06Q 10/063 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A target computing system comprising:
one or more computing devices having access to data at a plurality of data sources on a private data network; at least one processing device of an external-facing subsystem, wherein the external-facing subsystem facilitates a connection between the private data network and a third-party computing data discovery system; and
a non-transitory computer-readable medium comprising instructions that, when executed by the at least one processing device of the external-facing subsystem, cause the at least one processing device to perform operations comprising:
deploying executable code comprising a neural network based classification model, of the third-party computing data discovery system, to the one or more computing devices corresponding to the plurality of data sources on the private data network to perform scanning and classification operations at the plurality of data sources utilizing the neural network based classification model of the external-facing subsystem without accessing the data at the plurality of data sources on the private data network;
causing the one or more computing devices corresponding to the plurality of data sources to execute the executable code of the external-facing subsystem to scan the plurality of data sources for target data;
causing the one or more computing devices corresponding to the plurality of data sources to execute the executable code of the external-facing subsystem to utilize the neural network based classification model with the target data to generate data type predictions indicating data type labels for the target data by:
generating a prediction score of a particular data type for a data item of the target data based on tokenized data or labelled data corresponding to the target data; and
identifying the particular data type as a data type label for the data item of the target data based on the prediction score satisfying a confidence score threshold; and
responsive to scanning and classifying the target data stored on the plurality of data sources, generating and storing metadata for the plurality of data sources, the metadata indicating types of the target data.