CPC G06F 40/284 (2020.01) | 10 Claims |
1. A computing system comprising:
a processor; and
memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising:
receiving a request to tokenize alphanumeric text that includes a word; and
tokenizing the alphanumeric text such that a sequence of numeric identifiers that represents the alphanumeric text is output, wherein tokenizing the alphanumeric text comprises:
providing the alphanumeric text to a computer-implemented finite state machine, where the finite state machine includes a final state;
generating at least one token for the word based upon a value assigned to the final state of the computer-implemented finite state machine, where each token in the at least one token is included in a predefined vocabulary employed by the computing system when tokenizing text; and
outputting at least one numeric identifier as a representation of the word based upon the at least one token,
wherein the at least one numeric identifier is identified from a predefined set of numeric identifiers, and further wherein a class is assigned to the alphanumeric text by a computer-implemented text classifier based upon the output sequence of numeric identifiers.
|