US 12,461,921 B2
Utilizing metadata-based classifications for data discovery in data sets
Aniruddha Ghosal, Sandy Springs, GA (US); Shane Wiggins, Atlanta, GA (US); Kotreshi Sakragoudra, Marietta, GA (US); Kevin Jones, Atlanta, GA (US); and Laurence McNally, Atlanta, GA (US)
Assigned to OneTrust, LLC, Atlanta, GA (US)
Filed by OneTrust LLC, Atlanta, GA (US)
Filed on Nov. 9, 2023, as Appl. No. 18/505,890.
Claims priority of provisional application 63/383,115, filed on Nov. 10, 2022.
Prior Publication US 2024/0160632 A1, May 16, 2024
Int. Cl. G06F 16/2457 (2019.01); G06F 16/21 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/24573 (2019.01) [G06F 16/211 (2019.01); G06F 16/285 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
identifying, by processing hardware, a data source schema corresponding to a set of data elements for a data source;
determining, by the processing hardware, one or more suggested labels for the data source schema by:
matching metadata from the data source schema to a metadata-based recommendation from a metadata-based recommendation repository comprising metadata-based recommendations based on confidence scores between the metadata and the metadata-based recommendation, wherein the metadata-based recommendations comprise labels categorizing data elements in data sources; and
determining a suggested label from the metadata-based recommendations of the metadata-based recommendation repository when a confidence score satisfies a threshold confidence score and generating a predicted suggested label utilizing a classifier model when an additional confidence score is below the threshold confidence score;
providing, for display within a graphical user interface of a client device and by the processing hardware, one or more selectable option elements for the one or more suggested labels for the data source enabling acceptance or modification of the one or more suggested labels;
providing, for display within the graphical user interface of the client device and by the processing hardware, one or more metadata matching indicators to indicate the determination of the suggested label is determined from the metadata-based recommendation repository and that an additional suggested label is generated utilizing the classifier model; and
modifying, by the processing hardware, a data inventory by applying the one or more suggested labels from the metadata-based recommendation to an inventory object representing the data source to categorize the set of data elements.