US 11,983,223 B2
Finite automaton construction using regular expression derivatives to simulate behavior of a backtracking engine
Olli Ilari Saarikivi, Seattle, WA (US); Margus Veanes, Bellevue, WA (US); Stephen Harris Toub, Winchester, MA (US); Daniel J. Moseley, Jackson, WY (US); and Jose Rodrigo Perez Rodriguez, North Bend, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 18, 2022, as Appl. No. 17/890,931.
Prior Publication US 2024/0061885 A1, Feb. 22, 2024
Int. Cl. G06F 16/903 (2019.01); G06F 16/901 (2019.01)
CPC G06F 16/90344 (2019.01) [G06F 16/9024 (2019.01)] 21 Claims
OG exemplary drawing
 
1. A system comprising:
a memory; and
a processing system coupled to the memory, the processing system configured to:
determine behavior of a backtracking engine, the behavior indicating an order in which a plurality of paths in an input regular expression are to be evaluated by the backtracking engine;
construct a finite automaton that represents the input regular expression using a plurality of regular expression derivatives that are based on the input regular expression, the finite automaton including a graph that includes a root node that represents the input regular expression, construction of the finite automaton comprising:
derive a plurality of regular expressions such that each regular expression of the plurality of regular expressions is a regular expression derivative of the input regular expression with regard to a character in an alphabet or a regular expression derivative of another regular expression of the plurality of regular expressions with regard to a character in the alphabet;
assign a plurality of relative priorities to each plurality of respective related alternations in the plurality of regular expressions to correspond to an order in which the behavior indicates that the respective plurality of related alternations are to be evaluated by the backtracking engine;
cause a plurality of nodes that represent the plurality of respective regular expressions to be included in the graph; and
cause a plurality of transitions between respective pairs of nodes in a corpus of nodes that includes the root node and the plurality of nodes to be included in the graph; and
assign a plurality of priorities to a plurality of respective branches of the graph in the finite automaton, the plurality of priorities corresponding to the order in which the plurality of respective paths in the input regular expression are to be evaluated by the backtracking engine.