US 12,423,362 B2
Method for training isolation forest, and method for recognizing web crawler
Ke Cao, Guangzhou (CN); and Qinghua Zhong, Guangzhou (CN)
Assigned to BIGO TECHNOLOGY PTE. LTD., Singapore (SG)
Appl. No. 18/255,843
Filed by BIGO TECHNOLOGY PTE. LTD., Singapore (SG)
PCT Filed Dec. 3, 2021, PCT No. PCT/CN2021/135229
§ 371(c)(1), (2) Date Jun. 2, 2023,
PCT Pub. No. WO2022/117063, PCT Pub. Date Jun. 9, 2022.
Claims priority of application No. 202011408927.0 (CN), filed on Dec. 3, 2020.
Prior Publication US 2024/0111818 A1, Apr. 4, 2024
Int. Cl. G06F 16/951 (2019.01); G06F 16/955 (2019.01); H04L 9/40 (2022.01)
CPC G06F 16/951 (2019.01) [G06F 16/955 (2019.01); H04L 63/0236 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method for training isolation forests, comprising:
acquiring a plurality of categories by classifying uniform resource identifiers;
acquiring sample behavior data by monitoring a behavior of a client from each of Internet Protocol IP addresses in a plurality of IP addresses accessing the uniform resource identifiers under the plurality of categories;
encoding the sample behavior data as a sample access vector; and
training, based on the sample access vector, an isolation forest for recognizing a web crawler from the client;
wherein encoding the sample behavior data as the sample access vector comprises:
counting a quantity of uniform resource identifiers under each of the categories accessed by the client from each of the IP addresses in the sample behavior data; and
acquiring the sample access vector of each of the IP addresses by respectively setting, with each of the categories as a dimension of the vector, a plurality of quantities corresponding to the plurality of categories as values of a plurality of dimensions corresponding to the plurality of categories in the vector.