US 11,748,410 B2
System and method for pre-indexing filtering and correction of documents in search systems
Bruce Edward Kiefer, Denver, CO (US); and Gregory John Berka, Centennial, CO (US)
Assigned to OPEN TEXT HOLDINGS, INC., Menlo Park, CA (US)
Filed by OPEN TEXT HOLDINGS, INC., San Mateo, CA (US)
Filed on Oct. 12, 2021, as Appl. No. 17/499,710.
Application 17/499,710 is a continuation of application No. 16/582,882, filed on Sep. 25, 2019, granted, now 11,176,198.
Prior Publication US 2022/0067094 A1, Mar. 3, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 7/02 (2006.01); G06F 16/00 (2019.01); G06F 16/901 (2019.01); G06F 16/93 (2019.01); G06F 40/10 (2020.01); G06F 16/14 (2019.01); G06F 40/284 (2020.01); G06F 16/2458 (2019.01); G06F 16/31 (2019.01)
CPC G06F 16/901 (2019.01) [G06F 16/93 (2019.01); G06F 16/144 (2019.01); G06F 16/2465 (2019.01); G06F 16/316 (2019.01); G06F 40/10 (2020.01); G06F 40/284 (2020.01)] 21 Claims
OG exemplary drawing
 
1. A search system, comprising:
a processor;
a data store, having an index of a corpus stored thereon, wherein the corpus comprises a set of documents; and
a non-transitory computer readable medium, having instructions executable on the processor for:
receiving a first set of tokens for a document from a text extractor;
providing the first set of tokens to a detector for determining if each of the received first set of tokens is of an associated type;
determining an associated detector score based on the determination if each of the received first set of tokens is of the associated type of token, the detector score associated with the associated type;
determining a filter score based on the detector score produced by the detector, wherein the filter score is based on a scoring rule associated with the detector or the associated type;
determining whether the document should be indexed based on the application of a verdict rule, wherein the verdict rule includes an expression for evaluating the filter score associated with the detector; and
when it is determined that the document should be indexed, providing the document to an indexer adapted to index the first set of tokens for the document.