CPC G06F 16/90344 (2019.01) [G06F 16/93 (2019.01)] | 17 Claims |
1. A method for machine learning-based data extraction, comprising:
receiving, by a computing device, a document comprising one or more strings, the document associated with a document label separate from the one or more strings;
determining, by the computing device, whether the document label indicates a type of document structure;
receiving, by the computing device, a rule associated with the document label comprising a string sequence;
determining, by the computing device, whether the string sequence matches the one or more strings in the document by applying a regular expression parser to match the string sequence with the one or more strings in the document;
increasing, by the computing device, a confidence score in response to the string sequence matching the one or more strings in the document;
determining, by the computing device, a similarity score indicating a similarity between the string sequence and the one or more strings in the document;
increasing, by the computing device, the confidence score in response to the similarity score meeting or exceeding a first score value; and
displaying, by the computing device, on a display, the one or more strings in the document in response to the confidence score meeting or exceeding a second score value.
|