US 11,868,412 B1
Data enrichment systems and methods for abbreviated domain name classification
Gaurav Mitesh Dalal, Fremont, CA (US); Ali Mesdaq, San Jose, CA (US); and Hung-Jen Chang, Fremont, CA (US)
Assigned to Proofpoint, Inc., Sunnyvale, CA (US)
Filed by Proofpoint, Inc., Sunnyvale, CA (US)
Filed on Nov. 19, 2021, as Appl. No. 17/530,931.
Application 17/530,931 is a continuation of application No. 16/370,323, filed on Mar. 29, 2019, granted, now 11,194,871.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/953 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/953 (2019.01) [G06F 16/285 (2019.01); G06F 16/288 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving, through a user interface by a computer operating a data enrichment engine, an abbreviated domain name having characters;
obtaining, by the data enrichment engine, web content source code corresponding to the abbreviated domain name;
extracting, by the data enrichment engine, textual content from the web content source code corresponding to the abbreviated domain name;
capturing, by the data enrichment engine, a number of consecutive words from the textual content;
comparing, by the data enrichment engine, an initial character from each of the number of consecutive words with the characters of the abbreviated domain name;
determining, by the data enrichment engine based on the comparison, a set of words with initial characters that match characters of the abbreviated domain name to establish a relationship between the set of words and the abbreviated domain name, wherein the determining the set of words further comprises removing consecutive words from the textual content that have initial characters arranged in an order that is different from that of the characters of the abbreviated domain name;
determining, by the data enrichment engine, whether a candidate domain name based on the set of words and the abbreviated domain name are owned by a same entity and finding new domain names based on information relating to the candidate domain name; and
providing, by the data enrichment engine, the candidate domain name to a downstream computing facility for domain name classification.