US 12,243,337 B2
Text extraction using optical character recognition
Chris Demchalk, Frisco, TX (US); Ryan M. Parker, Dallas, TX (US); Lokesh Vijay Kumar, Frisco, TX (US); and Brian Fromknecht, Richardson, TX (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Mar. 8, 2024, as Appl. No. 18/599,667.
Application 18/599,667 is a continuation of application No. 17/741,113, filed on May 10, 2022, granted, now 11,961,316.
Prior Publication US 2024/0212375 A1, Jun. 27, 2024
Int. Cl. G06V 30/12 (2022.01); G06V 30/148 (2022.01); G06V 30/26 (2022.01); G06V 30/41 (2022.01)
CPC G06V 30/133 (2022.01) [G06V 30/155 (2022.01); G06V 30/26 (2022.01); G06V 30/41 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A method, comprising:
extracting, by at least one processor, a first set of text from a document using a first optical character recognition (OCR) tool;
extracting, by the at least one processor, a second set of text from the document using a second OCR tool;
comparing, by the at least one processor, a first metric of the first set of text to a second metric of the second set of text, the first metric measuring a first level of OCR quality of the first set of text and the second metric measuring a second level of OCR quality of the second set of text;
selecting, by the at least one processor, a first selected text from the first set of text or the second set of text based on a higher level of OCR quality;
extracting, by the at least one processor, a third set of text from the document using a third OCR tool;
comparing, by the at least one processor, a corresponding metric of the first selected text to a third metric of the third set of text, the third metric measuring a third level of OCR quality of the third set of text;
determining, for the first set of text, the second set of text, and the third set of text, a fourth metric, wherein the fourth metric comprises measuring a fourth level of OCR quality based on a respective number of words in respective ones of the first set, second set, or third set of extracted texts from the document divided by a number of pages in the document;
selecting, by the at least one processor, a second selected text from the first selected text or the third set of text based on a higher level of the third level of OCR quality or the fourth level of OCR quality; and
storing, by the at least one processor, the second selected text as a final text in a searchable format.