US 12,086,188 B2
Generation of regular expressions
Jonathan Dean Smith, Katy, TX (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by salesforce.com, inc., San Francisco, CA (US)
Filed on Jul. 12, 2021, as Appl. No. 17/372,830.
Application 17/372,830 is a continuation of application No. 16/423,422, filed on May 28, 2019, granted, now 11,086,939.
Prior Publication US 2021/0342402 A1, Nov. 4, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G06F 16/903 (2019.01); G06F 40/30 (2020.01)
CPC G06F 16/90344 (2019.01) [G06F 40/30 (2020.01)] 16 Claims
OG exemplary drawing
 
9. A method for obtaining a regular expression, comprising:
receiving a first plurality of text strings containing at least one of a malignant text string and a benign text string;
partitioning the first plurality of text strings into a plurality of substrings;
creating a token pool including one or more tokens, wherein each unique substring of the plurality of substrings is included in the token pool;
generating a population list including a second plurality of text strings that comprises a second text string of one or more tokens that are chosen from a predefined regular expression and a substring of the plurality of substrings;
iteratively updating the second plurality of text strings by computing a respective score for each text string in the second plurality of text strings depending on how the respective text string matches at least one of the malignant text string and the benign text string, and replacing a token in the respective text string with a candidate token that increases a matching level with the benign text string based on the respective score;
creating subsequent iterations of the population list by updating the second plurality of text strings included in the population list;
selecting, based on the computed scores associated with the second plurality of text strings in the population list, a third text string as a first regular expression from the updated second plurality of text strings.