US 12,254,037 B2
	Log parsing method and device, server and storage medium
Jing Han, Guangdong (CN); Jianwei Liu, Guangdong (CN); Li Chen, Guangdong (CN); Feng Ye, Guangdong (CN); Zheng Liu, Guangdong (CN); and Hang Ling, Guangdong (CN)
Assigned to ZTE CORPORATION, Shenzhen (CN)
Appl. No. 17/624,243
Filed by ZTE CORPORATION, Guangdong (CN)
PCT Filed Sep. 2, 2020, PCT No. PCT/CN2020/113060 § 371(c)(1), (2) Date Dec. 30, 2021, PCT Pub. No. WO2021/052177, PCT Pub. Date Mar. 25, 2021.
Claims priority of application No. 201910893383.2 (CN), filed on Sep. 20, 2019.
Prior Publication US 2022/0365957 A1, Nov. 17, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/31 (2019.01); G06F 16/353 (2025.01)

CPC G06F 16/353 (2019.01) [G06F 16/322 (2019.01)]

19 Claims

1. A method for parsing logs, the method being implemented in cooperation with a log parsing device, the log parsing device communicatively being connected with a communication component, the communication component receiving and sending data under the control of at least one processor, the at least one processor communicatively being connected with a memory, the memory storing instructions executable by the at least one processor; and the method comprising:

acquiring, by the at least one processor, sample log data, wherein the acquired sample log data includes a plurality of sample logs, and irrelevant information is filtered out from the plurality of sample logs by a regular expression;

performing, by the at least one processor, clustering processing on the acquired sample log data according to a length of each sample log in the acquired sample log data, and beginning and ending keywords of each sample log in the acquired sample log data, to obtain a plurality of log clusters;

determining, by the at least one processor, a quality score of each log cluster of the obtained plurality of log clusters, wherein determining the quality score of the each log cluster of the obtained plurality of log clusters comprises:

determining the quality score of the each log cluster according to a compactness of each log cluster of all log clusters and a separation between different log clusters, wherein the quality score is a normalized product of the compactness and the separation; and

parsing, by the at least one processor, a log online by using the obtained plurality of log clusters and determined quality scores of the obtained plurality of log clusters so as to be used for construction of program workflow and anomaly detection in a system;

wherein parsing the log includes determining an adaptive similarity threshold for the log using quality scores of the obtained plurality of log clusters; and

wherein parsing the log online by using the obtained plurality of log clusters and the determined quality scores of the obtained plurality of log clusters comprises:

acquiring a target log to be parsed;

filtering out irrelevant information in the target log by using the regular expression;

determining a matched log cluster among the obtained plurality of log clusters according to a length and beginning and ending keywords of the target log, and determining an adaptive similarity between the target log and the matched log cluster;

determining an adaptive similarity threshold for the target log and the matched log cluster according to a quality score of the matched log cluster; and

determining whether the adaptive similarity is greater than the adaptive similarity threshold; in response to the adaptive similarity being greater than the adaptive similarity threshold, inserting the target log into the matched log cluster; in response to the adaptive similarity not being greater than the adaptive similarity threshold, creating a log cluster by using the target log; wherein determining the matched log cluster among the obtained plurality of log clusters according to the length and beginning and ending keywords of the target log comprises:

using the length and the beginning and ending keywords of the log to be parsed directly as keys;

acquiring a log cluster whose log length and beginning and ending keywords match the log to be parsed from the plurality of log clusters by querying cache.