US 12,147,575 B2
Automated batch de-identification of unstructured healthcare documents
Andre Sublett, Bainbridge Island, WA (US); Tim Osten, Poulsbo, WA (US); and John Scott, Stephenville, TX (US)
Assigned to CONCORD III, LLC, Seattle, WA (US)
Filed by Concord III, LLC, Seattle, WA (US)
Filed on Jun. 22, 2022, as Appl. No. 17/846,098.
Prior Publication US 2023/0418978 A1, Dec. 28, 2023
Int. Cl. G06F 21/62 (2013.01); G06F 40/174 (2020.01); G06V 30/10 (2022.01); G06V 30/412 (2022.01); G16H 10/60 (2018.01)
CPC G06F 21/6254 (2013.01) [G06F 40/174 (2020.01); G06V 30/10 (2022.01); G06V 30/412 (2022.01); G16H 10/60 (2018.01)] 15 Claims
OG exemplary drawing
 
1. A method for the batch de-identification of unstructured health care documents, the method comprising:
optical character recognizing a form-based document, the optical character recognition (OCR) producing an initial set of terms;
identifying initial specific terms amongst the initial set of terms containing protected information and replacing in the form-based document each of the identified initial specific terms with synthetically generated corresponding terms;
performing additional OCR on the form-based document to produce a new set of terms and identifying new specific terms amongst the new set of terms containing protected information;
comparing the new specific terms to the initial specific terms; and,
adding the form-based document to a repository of de-identified documents only if none of the new specific terms are equivalent to corresponding ones of the initial specific terms, but otherwise flagging the form-based document in error.