| CPC G06F 40/117 (2020.01) [G06F 40/143 (2020.01); G06F 40/151 (2020.01); G06F 40/166 (2020.01); G06F 40/284 (2020.01)] | 18 Claims |

|
1. A computer-implemented method for tagging electronic documents, the computer-implemented method comprising:
receiving, by an input module, an electronic document to be tagged;
preprocessing, by a preprocessing module, the electronic document to be tagged, the preprocessing comprising:
extracting a text from the electronic document to be tagged;
replacing at least one of a numerical amount or a date in the extracted text with a predetermined symbol, wherein the predetermined symbol is not used in the extracted text before replacing at least one of the numerical amount or the date in the extracted text, wherein the predetermined symbol includes a special character that is non-numeric and non-alphabetic; and
tokenizing the extracted text with the predetermined symbol into a plurality of tokens without fragmenting the predetermined symbol, wherein keeping the predetermined symbol unfragmented avoids inaccurate tagging associated with the predetermined symbol;
determining, by a deep learning module, a tag for at least one of the plurality of tokens; and
outputting, by an output module, the determined tag for the at least one of the plurality of tokens.
|