CPC G06F 16/353 (2019.01) [G06F 16/322 (2019.01)] | 19 Claims |
1. A method for parsing logs, the method being implemented in cooperation with a log parsing device, the log parsing device communicatively being connected with a communication component, the communication component receiving and sending data under the control of at least one processor, the at least one processor communicatively being connected with a memory, the memory storing instructions executable by the at least one processor; and the method comprising:
acquiring, by the at least one processor, sample log data, wherein the acquired sample log data includes a plurality of sample logs, and irrelevant information is filtered out from the plurality of sample logs by a regular expression;
performing, by the at least one processor, clustering processing on the acquired sample log data according to a length of each sample log in the acquired sample log data, and beginning and ending keywords of each sample log in the acquired sample log data, to obtain a plurality of log clusters;
determining, by the at least one processor, a quality score of each log cluster of the obtained plurality of log clusters, wherein determining the quality score of the each log cluster of the obtained plurality of log clusters comprises:
determining the quality score of the each log cluster according to a compactness of each log cluster of all log clusters and a separation between different log clusters, wherein the quality score is a normalized product of the compactness and the separation; and
parsing, by the at least one processor, a log online by using the obtained plurality of log clusters and determined quality scores of the obtained plurality of log clusters so as to be used for construction of program workflow and anomaly detection in a system;
wherein parsing the log includes determining an adaptive similarity threshold for the log using quality scores of the obtained plurality of log clusters; and
wherein parsing the log online by using the obtained plurality of log clusters and the determined quality scores of the obtained plurality of log clusters comprises:
acquiring a target log to be parsed;
filtering out irrelevant information in the target log by using the regular expression;
determining a matched log cluster among the obtained plurality of log clusters according to a length and beginning and ending keywords of the target log, and determining an adaptive similarity between the target log and the matched log cluster;
determining an adaptive similarity threshold for the target log and the matched log cluster according to a quality score of the matched log cluster; and
determining whether the adaptive similarity is greater than the adaptive similarity threshold; in response to the adaptive similarity being greater than the adaptive similarity threshold, inserting the target log into the matched log cluster; in response to the adaptive similarity not being greater than the adaptive similarity threshold, creating a log cluster by using the target log; wherein determining the matched log cluster among the obtained plurality of log clusters according to the length and beginning and ending keywords of the target log comprises:
using the length and the beginning and ending keywords of the log to be parsed directly as keys;
acquiring a log cluster whose log length and beginning and ending keywords match the log to be parsed from the plurality of log clusters by querying cache.
|