US 12,216,799 B2
Systems and methods for computing with private healthcare data
Sankar Ardhanari, Chapel Hill, NC (US); Karthik Murugadoss, Cambridge, MA (US); Murali Aravamudan, Andover, MA (US); and Ajit Rajasekharan, West Windsor, NJ (US)
Assigned to nference, Inc., Cambridge, MA (US)
Filed by Nference, Inc., Cambridge, MA (US)
Filed on Oct. 19, 2023, as Appl. No. 18/381,873.
Application 18/381,873 is a continuation of application No. 17/975,489, filed on Oct. 27, 2022, granted, now 11,829,514.
Application 17/975,489 is a continuation of application No. 17/192,564, filed on Mar. 4, 2021, granted, now 11,487,902, issued on Nov. 1, 2022.
Application 17/192,564 is a continuation in part of application No. 16/908,520, filed on Jun. 22, 2020, granted, now 11,545,242, issued on Jan. 3, 2023.
Claims priority of provisional application 63/128,542, filed on Dec. 21, 2020.
Claims priority of provisional application 63/109,769, filed on Nov. 4, 2020.
Claims priority of provisional application 63/012,738, filed on Apr. 20, 2020.
Claims priority of provisional application 62/984,989, filed on Mar. 4, 2020.
Claims priority of provisional application 62/985,003, filed on Mar. 4, 2020.
Claims priority of provisional application 62/962,146, filed on Jan. 16, 2020.
Claims priority of provisional application 62/865,030, filed on Jun. 21, 2019.
Prior Publication US 2024/0119176 A1, Apr. 11, 2024
Int. Cl. G06F 21/60 (2013.01); G06F 21/62 (2013.01); G06N 20/00 (2019.01); G16H 10/60 (2018.01); H04L 29/06 (2006.01)
CPC G06F 21/6254 (2013.01) [G06N 20/00 (2019.01); G16H 10/60 (2018.01)] 19 Claims
OG exemplary drawing
 
1. A de-identification method comprising:
receiving a plurality of data sets, wherein the plurality of data sets comprises:
a first data set, wherein the first data set comprises a labeled data set for one or more entity types; and
a second data set, wherein the training data set comprises an unlabeled data set for the one or more entity types;
determining one machine-learning model from a plurality of machine-learning models for each of one or more entity types;
fine-tuning the determined machine-learning model for each of the one or more entity types, wherein fine-tuning the determined machine-learning model comprises:
creating a plurality of training data sets, wherein the plurality of training data sets comprises:
a first training data set, wherein the first training data set comprises the first data set; and
a second training data set, wherein the second training data set comprises the second data set;
training the determined machine-learning model using the first training data set;
validating the trained machine-learning model, wherein validating the trained machine learning model further comprises:
generating a recall score for each entity type of the one or more entity types;
comparing the recall score to a threshold for the recall score for each entity type of the one or more entity types; and
updating the trained machine-learning model using the second training data set as a function of the validation; and
obfuscating the second data set using the fine-tuned machine-learning model.