US 11,782,983 B1
Expanded character encoding to enhance regular expression filter capabilities
Pritish Pravin Malavade, San Jose, CA (US); Nigel Antoine Gulstone, San Jose, CA (US); and Kalaiselvi Kamaraj, Foster City, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 27, 2020, as Appl. No. 17/106,010.
Int. Cl. G06F 16/00 (2019.01); G06F 16/9032 (2019.01); G06F 16/9035 (2019.01); G06F 16/30 (2019.01)
CPC G06F 16/90324 (2019.01) [G06F 16/9035 (2019.01)] 20 Claims
OG exemplary drawing
 
1. One or more integrated circuits, configured to implement a regular expression filter, the regular expression filter comprising:
a match and substitution engine, configured to:
compare two or more adjacent characters in a stream of characters being processed through the match and substitution engine with a rule to recognize the two or more adjacent characters whether the two or more adjacent characters are a match for a regular expression, wherein the rule instructs the match and substitution engine to replace the two or more adjacent characters with a symbol to substitute for the two or more adjacent characters;
based on the comparison, identify the two or more adjacent characters in the stream of characters according to the rule as the match for evaluating the regular expression; and
for a first character of the two or more adjacent characters:
output the symbol in place of the first character of the two or more adjacent characters in the stream of characters; and
output an enabled signal for the symbol;
for a second character of the two or more adjacent characters that occurs after the first character of the two or more adjacent characters in the stream of characters:
output the second character in the stream of characters; and
output a disabled signal for the second character;
one or more non-deterministic finite automaton (NFA) states, configured to process:
the symbol as a replacement of the first character in the two or more adjacent characters in the stream of characters; and
the second character of the two or more adjacent characters in the stream of characters as a no-op according to the disabled signal output from the match and substitution engine; and
a final acceptor, configured to provide a match signal that indicates whether the stream of characters matches the regular expression based on respective outputs of the one or more NFA states.