US 11,989,184 B2
Regular expression search query processing using pruning index
Thierry Cruanes, San Mateo, CA (US); Ismail Oukid, Berlin (DE); Stefan Richter, Berlin (DE); and Alejandro Salinger, Berlin (DE)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Apr. 24, 2023, as Appl. No. 18/305,993.
Application 18/305,993 is a continuation of application No. 17/934,977, filed on Sep. 23, 2022, granted, now 11,681,708.
Application 17/934,977 is a continuation in part of application No. 17/649,642, filed on Feb. 1, 2022, granted, now 11,487,763.
Application 17/649,642 is a continuation of application No. 17/486,426, filed on Sep. 27, 2021, granted, now 11,275,739.
Application 17/486,426 is a continuation of application No. 17/484,817, filed on Sep. 24, 2021, granted, now 11,275,738.
Application 17/484,817 is a continuation in part of application No. 17/388,160, filed on Jul. 29, 2021, granted, now 11,321,325.
Application 17/388,160 is a continuation of application No. 17/218,962, filed on Mar. 31, 2021, granted, now 11,113,286.
Application 17/218,962 is a continuation of application No. 17/086,228, filed on Oct. 30, 2020, granted, now 10,997,179.
Application 17/086,228 is a continuation in part of application No. 16/932,462, filed on Jul. 17, 2020, granted, now 10,942,925.
Application 16/932,462 is a continuation of application No. 16/727,315, filed on Dec. 26, 2019, granted, now 10,769,150.
Claims priority of provisional application 63/260,874, filed on Sep. 3, 2021.
Claims priority of provisional application 63/084,394, filed on Sep. 28, 2020.
Prior Publication US 2023/0342362 A1, Oct. 26, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/24 (2019.01); G06F 16/22 (2019.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01); G06F 16/9035 (2019.01)
CPC G06F 16/24557 (2019.01) [G06F 16/2272 (2019.01); G06F 16/283 (2019.01); G06F 16/9035 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
at least one hardware processor; and
at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:
receiving a query directed at a source table organized into a set of batch units, the query comprising a regular expression search pattern;
converting the regular expression search pattern to a pruning index predicate, the converting of the regular expression search pattern to the pruning index predicate comprising:
generating an expression tree comprising a tree data structure that includes a set of substring literals extracted from the regular expression search pattern; and
removing, from the expression tree, a node corresponding to a substring literal that does not produce an N-gram;
generating a set of N-grams based on the expression tree;
identifying, using a pruning index, a subset of batch units to scan for data matching the query based on the set of N-grams, the pruning index indexing distinct N-grams in each column of the source table; and
processing the query by scanning the subset of batch units.