US 12,073,001 B1
Advanced deidentification of information, such as information about a person
Lindsay Thomas Mico, Portland, OR (US); Vivek Tomer, Lake Oswego, OR (US); Yuqing Guo, Amherst, MA (US); and Amar Nadaa Taiyab, Encinitas, CA (US)
Assigned to Providence St. Joseph Health, Renton, WA (US)
Filed by Providence St. Joseph Health, Renton, WA (US)
Filed on Mar. 21, 2024, as Appl. No. 18/612,921.
Int. Cl. G06F 21/62 (2013.01); G06F 40/284 (2020.01)
CPC G06F 21/6254 (2013.01) [G06F 40/284 (2020.01)] 22 Claims
OG exemplary drawing
 
1. One or more instances of computer-readable media collectively having contents configured to cause a computing system to perform a method, the method comprising:
receiving a data object;
identifying in the data object a plurality of constituent portions;
selecting a first constituent portion of the data object containing one or more free text strings;
in a first computing mechanism:
for each free text string:
subjecting the free text string to a trained machine learning model to predict occurrence within the free text string of a personal identifier based on predicting occurring within of certain named entities;
in a repeatable manner, generating a substitute personal identifier from the predicted personal identifier;
creating a copy of the free text string;
in the free text string copy, replacing the predicted personal identifier with the generated substitute personal identifier;
after the replacement, collecting the free text string copies in a modified version of the first constituent portion;
selecting a second constituent portion of the data object distinct from the first constituent portion containing data items of a type other than free text strings;
in a second computing mechanism distinct from the first computing mechanism:
for each data item:
in a repeatable manner, generating a substitute data item from the data item;
collecting the substitute data items in a modified version of the second constituent portion; and
assembling the modified version of the first constituent portion and modified version of the second constituent portion into a modified version of the data object,
wherein the subjecting subjects the free text string to a plurality of trained machine learning models in sequence to collectively predict occurrence within the free text string of a personal identifier based on predicting occurring within of certain named entities.