| CPC G06F 16/906 (2019.01) [G06F 16/9017 (2019.01); G06F 16/93 (2019.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
determining, by processing hardware, a probabilistic data structure comprising a bit vector with sets of bit values mapped to a plurality of items in a lookup list, the plurality of items including one or more multi-token items;
determining, by the processing hardware, a maximum number of tokens of the one or more multi-token items in the lookup list;
determining, by the processing hardware and for a selected token from text content in a digital document, a set of sequential tokens including the selected token based on the maximum number of tokens;
generating, by the processing hardware, classifications for the text content in the digital document of a digital data repository by iteratively:
comparing the set of sequential tokens to the sets of bit values mapped to the plurality of items in the lookup list;
reducing a number of tokens in the set of sequential tokens for a subsequent comparison; and
providing, for display within a graphical user interface of a client device, indications of the classifications of the text content in the digital document relative to the plurality of items in the lookup list.
|