CPC G06F 16/90344 (2019.01) [G06F 40/30 (2020.01)] | 16 Claims |
9. A method for obtaining a regular expression, comprising:
receiving a first plurality of text strings containing at least one of a malignant text string and a benign text string;
partitioning the first plurality of text strings into a plurality of substrings;
creating a token pool including one or more tokens, wherein each unique substring of the plurality of substrings is included in the token pool;
generating a population list including a second plurality of text strings that comprises a second text string of one or more tokens that are chosen from a predefined regular expression and a substring of the plurality of substrings;
iteratively updating the second plurality of text strings by computing a respective score for each text string in the second plurality of text strings depending on how the respective text string matches at least one of the malignant text string and the benign text string, and replacing a token in the respective text string with a candidate token that increases a matching level with the benign text string based on the respective score;
creating subsequent iterations of the population list by updating the second plurality of text strings included in the population list;
selecting, based on the computed scores associated with the second plurality of text strings in the population list, a third text string as a first regular expression from the updated second plurality of text strings.
|