US 12,079,232 B2
Configurable approximate search of character strings
Viliam Holub, Prague (CZ); Eoin Shanley, Dublin (IE); and Trevor Parsons, Boston, MA (US)
Assigned to Rapid7, Inc., Boston, MA (US)
Filed by Rapid7, Inc., Boston, MA (US)
Filed on Sep. 8, 2022, as Appl. No. 17/940,069.
Application 17/940,069 is a continuation of application No. 16/732,165, filed on Dec. 31, 2019, granted, now 11,468,074.
Prior Publication US 2023/0004561 A1, Jan. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/24 (2019.01); G06F 16/2455 (2019.01); G06F 16/2458 (2019.01); G06N 5/022 (2023.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06F 16/2462 (2019.01) [G06F 16/24553 (2019.01); G06N 5/022 (2013.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
performing, by an approximate search system implemented by one or more hardware processors with associated memory:
receiving, via a configuration interface of the approximate search system, configuration information for executing an approximate search of a string in a text, wherein the configuration information limits an amount of memory used to execute the approximate search and specifies:
(a) deviation operations permitted on the string during the approximate search,
(b) respective costs of the deviation operations, and
(c) a cost limit for the approximate search;
executing the approximate search according to the configuration information, comprising:
maintaining in memory a subset of states of a state machine generated during the approximate search, wherein individual ones of the states specify:
(a) a match position in the string achieved at the individual state,
(b) a last deviation operation performed on the string to achieve the individual state, and
(c) a cost accumulated for any deviation operations performed on the string to achieve the individual state;
repeatedly modifying the subset of states in memory for successive characters in the text, comprising:
generating one or more new states from one or more preceding states of the state machine by applying one or more last deviation operations indicated by the one or more preceding states to advance search paths in the approximate search, and
pruning one or more existing states of the state machine whose accumulated cost from previous deviation operations exceed the cost limit, so that the approximate search abandons respective search paths leading to the one or more existing states;
determining an approximate match of the string in the text in response to reaching an end state of the state machine, wherein a match position of end state equals a final match position of the string, and a cost of the end state is below the cost limit for the approximate search; and
outputting the approximate match of the string.