US 12,380,082 B2
Method and system for automatically tagging data
Yeye He, Bellevue, WA (US); Jie Song, Ann Arbor, MI (US); Yue Wang, Bellevue, WA (US); Surajit Chaudhuri, Kirkland, WA (US); Vishal Kumar Seshagirirao Anil, Redmond, WA (US); Yaron Y. Goland, Seattle, WA (US); Gaurav Malhotra, Bothell, WA (US); and Blake Lassiter, Seattle, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 23, 2022, as Appl. No. 17/847,902.
Application 17/847,902 is a continuation of application No. 16/953,313, filed on Nov. 19, 2020, granted, now 11,397,716.
Prior Publication US 2022/0318221 A1, Oct. 6, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/22 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/2272 (2019.01) [G06F 16/221 (2019.01); G06F 16/2365 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for automatically tagging data, the method comprising:
receiving a first set of data;
automatically generating, based on a part of the first set of data, a statistical summary of the first set of data, wherein the statistical summary includes:
a data-tagging pattern representing a value of the part of the first set of data, and
a first degree of generalizing data pattern for the data-tagging pattern;
interactively receiving, based on a user selection through a graphical user interface, a second set of data, wherein the second set of data is distinct from the first set of data;
generating, based on the data-tagging pattern in the statistical summary and data in the second set of data, a candidate data-tagging pattern;
selecting, based on a combination of a false negative rate associated with the candidate data-tagging pattern upon the data in the second set of data and a second degree of generalizing data pattern associated with the candidate data-tagging pattern upon the first set of data, the candidate data-tagging pattern as a data-tagging pattern;
automatically tagging based on the selected data-tagging pattern for creating new tagging data, a data tag to a subset of the first set of data; and
presenting, through the graphical user interface, based on the automatically tagged data tag, at least part of the automatically tagged subset of the first set of data for storing as output of processing a search query.