US 12,339,908 B2
Systems and methods for machine learning-based data extraction
Ankit Kumar Sinha, Dallas, TX (US); Hasan Kohadawala, Chennai (IN); Bhargava Reddy Karumuri, Dallas, TX (US); Saravanan Annamalai, Chennai (IN); Kishorekumar Torangallu, Dallas, TX (US); and SaiNikitha Cheruku, Begaluru (IN)
Assigned to Nationstar Mortgage LLC, Coppell, TX (US)
Filed by Nationstar Mortgage LLC, Coppell, TX (US)
Filed on Sep. 28, 2021, as Appl. No. 17/488,108.
Prior Publication US 2023/0101817 A1, Mar. 30, 2023
Int. Cl. G06F 16/90 (2019.01); G06F 16/903 (2019.01); G06F 16/93 (2019.01)
CPC G06F 16/90344 (2019.01) [G06F 16/93 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method for machine learning-based data extraction, comprising:
receiving, by a computing device, a document comprising one or more strings, the document associated with a document label separate from the one or more strings;
determining, by the computing device, whether the document label indicates a type of document structure;
receiving, by the computing device, a rule associated with the document label comprising a string sequence;
determining, by the computing device, whether the string sequence matches the one or more strings in the document by applying a regular expression parser to match the string sequence with the one or more strings in the document;
increasing, by the computing device, a confidence score in response to the string sequence matching the one or more strings in the document;
determining, by the computing device, a similarity score indicating a similarity between the string sequence and the one or more strings in the document;
increasing, by the computing device, the confidence score in response to the similarity score meeting or exceeding a first score value; and
displaying, by the computing device, on a display, the one or more strings in the document in response to the confidence score meeting or exceeding a second score value.