US 12,081,550 B1
Machine learning-based URL categorization system with selection between more and less specific category labels
Xinjun Zhang, San Jose, CA (US); Yi Zhang, Santa Clara, CA (US); Rongrong Tao, Fremont, CA (US); Dong Guo, San Jose, CA (US); Hongbo Yang, Palo Alto, CA (US); and Jun Ou, Cupertino, CA (US)
Assigned to Netskope, Inc., Santa Clara, CA (US)
Filed by Netskope, Inc., Santa Clara, CA (US)
Filed on Oct. 2, 2023, as Appl. No. 18/375,976.
Int. Cl. H04L 9/40 (2022.01); H04L 41/16 (2022.01)
CPC H04L 63/101 (2013.01) [H04L 41/16 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method of choosing between alternative category labels tentatively assigned to webpages by a classifier ensemble running on processors, including:
applying the classifier ensemble including at least a sensitive category classifier, a non-sensitive category classifier, a title and metadata classifier and a heuristic classifier to at least tens of thousands of webpages;
applying a post processor to outputs of the classifier ensemble and, for at least some of the webpages, tentatively assigning at least two category labels for non-sensitive categories to produce tentatively assigned category labels;
for at least some of the webpages assigned the at least two category labels, automatically determining that at least one but not all of the tentatively assigned category labels is a general label and de-selecting the general label;
saving the assigned category label that is not de-selected to the webpage; and
distributing the assigned category labels for at least some of the tens of thousands of webpages for use in controlling access to webpages by users on user systems protected using the assigned category labels.