| CPC G06F 16/951 (2019.01) [H04L 67/02 (2013.01); H04L 67/56 (2022.05)] | 6 Claims |

|
1. A method of crawling a website by a terminal including a processor, a communication unit and a memory storing an artificial intelligence model, the method comprising:
modifying, by the terminal, a header included in a hypertext transfer protocol (HTTP) request message to avoid bot detection;
transmitting, by the terminal, via the communication unit, the HTTP request message to a client server through a proxy server providing a dynamic Internet protocol (IP);
receiving, by the terminal, via the communication unit, a response message for accessing the website from the client server, wherein the response message includes information on a configuration of a web page of the website;
obtaining, by the terminal, a tag for checking an element displayed on a user screen of the terminal based on the information on the configuration of the web page included in the received response message;
checking, by the terminal, through the tag, a path corresponding to the element that is displayed on the user screen and can be checked by a user through the user screen; and
performing, by the terminal, the crawling based on the checked path,
wherein the method further comprises:
when a character string Captcha is encountered during the crawling, automatically solving the character string Captcha through the artificial intelligence model stored in the memory,
when a different type of Captcha than the character string Captcha is encountered during the crawling, (i) receiving an image for checking the different type of Captcha via the communication unit, (ii) transmitting the image for checking the different type of Captcha to the user through a messenger application, and (iii) solving the different type of Captcha by interacting with the user through the messenger application,
wherein the header of the HTTP request message includes a first field corresponding to user-agent and a second field corresponding to referer, the referer indicating an address of a web page from which a resource request is sent to the client server, and
wherein the modifying of the header includes:
modifying the user-agent, and
modifying the address of the web page in the referer to a domain address of the website.
|