| CPC H04L 63/1483 (2013.01) [H04L 41/16 (2013.01)] | 27 Claims |

|
1. A method for searching for and identifying unauthorized websites, the method comprising:
receiving, by a website detection system comprising a processing device and a memory storing computer-readable instructions, a plurality of domains from one or more data providers via a communications network;
combining, by the website detection system, the plurality of domains into a single list of domains;
splitting, by the website detection system, the single list of domains into a plurality of portions corresponding to a number of nodes;
extracting, by the website detection system, Hypertext Markup Language (“HTML”) source code from each domain of the plurality of domains, wherein the extracting is performed in parallel by each node of the number of nodes on a corresponding portion of the plurality of portions via a parallel Hypertext Transfer Protocol/Hypertext Transfer Protocol Secure (“http/https”) data transfer application;
storing, by the website detection system, the extracted HTML source code as website data in a website data database;
comparing, by the website detection system, the website data to template data of known unauthorized sites to determine a level of similarity, the template data stored in a template data database;
creating, by the website detection system, a list of potentially unauthorized sites based on the comparing;
storing, by the website detection system, the list of potentially unauthorized sites and associated website data in a potentially unauthorized sites database;
reviewing, by the website detection system, the list of potentially unauthorized sites and the associated website data to identify one or more unauthorized websites on the list of potentially unauthorized sites; and
updating, by the website detection system, the template data in the template data database to include the one or more unauthorized websites and the associated website data.
|