US 12,451,221 B2
Systems and methods for model-assisted data processing to predict biomarker status and testing dates
Auriane Blarre, New York, NY (US); Prakrit Baruah, Boston, MA (US); Guy Amster, Hoboken, NJ (US); Benjamin Irvine, Brooklyn, NY (US); Alexander Rich, Brooklyn, NY (US); and Sabri Eyuboglu, Palo Alto, CA (US)
Assigned to Flatiron Health, Inc., New York, NY (US)
Filed by Flatiron Health, Inc., New York, NY (US)
Filed on Dec. 15, 2022, as Appl. No. 18/082,344.
Claims priority of provisional application 63/290,427, filed on Dec. 16, 2021.
Prior Publication US 2023/0197220 A1, Jun. 22, 2023
Int. Cl. G16H 10/60 (2018.01); G06N 3/08 (2023.01)
CPC G16H 10/60 (2018.01) [G06N 3/08 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A model-assisted system for automatically processing medical records storing unstructured data to extract dates associated with a patient event, the system comprising:
memory storing a machine learning model, the machine learning model trained to process one or more text snippets associated with at least one date and output a confidence score indicating a likelihood that the at least one date is associated with the patient event; and
at least one processor programmed to:
automatically extract a date associated with the patient event for each of a plurality of patients from the medical records, the extracting comprising:
access, from a database storing the medical records, at least one document associated with the patient, the at least one document storing unstructured textual data of at least one healthcare provider note, at least one lab record, at least one pathology record, and/or at least one treatment plan, wherein the unstructured textual data comprises freeform text;
identify a plurality of dates in the freeform text by identifying portions of the unstructured textual data that match a predetermined date format;
extract, for each of the plurality of dates identified in the freeform text, a corresponding snippet of the unstructured textual data at least in part by extracting a predetermined number of words or characters before and/or after the identified date in the at least one document thereby obtaining a plurality of text snippets corresponding to the plurality of dates; and
process the plurality of text snippets using the machine learning model stored in the memory to extract the date of the patient event, the processing comprising:
generate, for each of the plurality of text snippets, a set of vectors representing the text snippet at least in part by:
 dividing the text snippet into a plurality of tokens; and
 generating vectors representing the plurality of tokens to obtain the set of vectors representing the text snippet;
provide the sets of vectors representing respective text snippets of the plurality of text snippets as input to the machine learning model to obtain output comprising a plurality of confidence scores, the plurality of confidence scores each indicating a likelihood that a particular one of the plurality of dates is associated with the patient event; and
determine the date associated with the patient event for the patient at least in part by selecting one of the plurality of dates using the plurality of confidence scores output by the machine learning model;
store extracted dates associated with the patient event for the plurality of patients in the database in a predetermined structured format at least in part by storing the extracted dates in a field of the predetermined structured format designated for storage of patient event dates; and
cause a user device to display a user interface displaying one or more of the extracted dates in the user interface at least in part by reading the one or more extracted dates from the field of the predetermined structured format.