CPC G06F 16/24557 (2019.01) [G06F 16/2272 (2019.01); G06F 16/283 (2019.01); G06F 16/9035 (2019.01)] | 20 Claims |
1. A system comprising:
at least one hardware processor; and
at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:
receiving a query directed at a source table organized into a set of batch units, the query comprising a regular expression search pattern;
converting the regular expression search pattern to a pruning index predicate, the converting of the regular expression search pattern to the pruning index predicate comprising:
generating an expression tree comprising a tree data structure that includes a set of substring literals extracted from the regular expression search pattern; and
removing, from the expression tree, a node corresponding to a substring literal that does not produce an N-gram;
generating a set of N-grams based on the expression tree;
identifying, using a pruning index, a subset of batch units to scan for data matching the query based on the set of N-grams, the pruning index indexing distinct N-grams in each column of the source table; and
processing the query by scanning the subset of batch units.
|