US 11,853,329 B2
Metadata classification
Craig E. Hawco, Toronto (CA); and Joseph David Jensen, Riverhead, NY (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by SNOWFLAKE INC., Bozeman, MT (US)
Filed on Mar. 21, 2023, as Appl. No. 18/124,415.
Application 18/124,415 is a continuation of application No. 17/163,156, filed on Jan. 29, 2021, granted, now 11,630,853.
Prior Publication US 2023/0222142 A1, Jul. 13, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/28 (2019.01); G06F 16/22 (2019.01); G06N 5/01 (2023.01)
CPC G06F 16/285 (2019.01) [G06F 16/221 (2019.01); G06N 5/01 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
retrieving data from a data set, wherein the data is organized in a plurality of columns;
for each column in the plurality of columns:
generating one or more candidate semantic categories for the column, wherein each of the one or more candidate semantic categories is included in a plurality of semantic categories;
generating a probability for each of the one or more candidate semantic categories by applying a bloom filter to the data of the column;
creating a feature vector for the column from the one or more candidate semantic categories and the corresponding probabilities;
determining a semantic category of the column based on the feature vector;
determining a privacy category based on a designation of the semantic category; and
anonymizing the data in the column based on the privacy category to produce anonymized data, wherein the anonymizing comprises replacing more specific data in the column with less specific data based on a data hierarchy of the plurality of semantic categories that relates the more specific data to the less specific data, and wherein the less specific data reduces a likelihood that one or more data subjects are identifiable from the anonymized data; and
generating an anonymized view of the data set using the anonymized data.