| CPC G06F 16/90344 (2019.01) [G06F 16/93 (2019.01)] | 17 Claims | 

| 
               1. A method for machine learning-based data extraction, comprising: 
            receiving, by a computing device, a document comprising one or more strings, the document associated with a document label separate from the one or more strings; 
                determining, by the computing device, whether the document label indicates a type of document structure; 
                receiving, by the computing device, a rule associated with the document label comprising a string sequence; 
                determining, by the computing device, whether the string sequence matches the one or more strings in the document by applying a regular expression parser to match the string sequence with the one or more strings in the document; 
                increasing, by the computing device, a confidence score in response to the string sequence matching the one or more strings in the document; 
                determining, by the computing device, a similarity score indicating a similarity between the string sequence and the one or more strings in the document; 
                increasing, by the computing device, the confidence score in response to the similarity score meeting or exceeding a first score value; and 
                displaying, by the computing device, on a display, the one or more strings in the document in response to the confidence score meeting or exceeding a second score value. 
               |