US 12,456,317 B2
Systems and methods for detection and correction of OCR text
Masaki Stanley Fujimoto, Provo, UT (US); and Yen-Yun Yu, Murray, UT (US)
Assigned to Ancestry.com Operations Inc., Lehi, UT (US)
Filed by Ancestry.com Operations Inc., Lehi, UT (US)
Filed on Aug. 25, 2022, as Appl. No. 17/895,818.
Claims priority of provisional application 63/237,839, filed on Aug. 27, 2021.
Prior Publication US 2023/0083000 A1, Mar. 16, 2023
Int. Cl. G06V 30/12 (2022.01); G06V 30/19 (2022.01); G06V 30/26 (2022.01)
CPC G06V 30/133 (2022.01) [G06V 30/19147 (2022.01); G06V 30/26 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving a document that includes text obtained at least in part through OCR;
applying an adjusted bidirectional-and-auto-regressive-transformers (BART) model to the text to detect at least one error in a subset of the text, the adjusted BART model having been adjusted from a BART model pretrained to perform a non-optical character recognition (non-OCR) task using a first training dataset comprising corrupted text data and the adjusted BART model further being adjusted from the BART model to perform an OCR task using a second training dataset comprising OCR samples; and
generating, by applying the adjusted BART model to the text of the document, an updated subset of the text correcting the at least one error in the subset of the text.