CPC G06F 16/901 (2019.01) [G06F 16/93 (2019.01); G06F 16/144 (2019.01); G06F 16/2465 (2019.01); G06F 16/316 (2019.01); G06F 40/10 (2020.01); G06F 40/284 (2020.01)] | 21 Claims |
1. A search system, comprising:
a processor;
a data store, having an index of a corpus stored thereon, wherein the corpus comprises a set of documents; and
a non-transitory computer readable medium, having instructions executable on the processor for:
receiving a first set of tokens for a document from a text extractor;
providing the first set of tokens to a detector for determining if each of the received first set of tokens is of an associated type;
determining an associated detector score based on the determination if each of the received first set of tokens is of the associated type of token, the detector score associated with the associated type;
determining a filter score based on the detector score produced by the detector, wherein the filter score is based on a scoring rule associated with the detector or the associated type;
determining whether the document should be indexed based on the application of a verdict rule, wherein the verdict rule includes an expression for evaluating the filter score associated with the detector; and
when it is determined that the document should be indexed, providing the document to an indexer adapted to index the first set of tokens for the document.
|