US 12,020,786 B2
	Model for health record classification
John Zhu, San Mateo, CA (US); Noah Lieberman, San Francisco, CA (US); Ha Pham, Los Altos, CA (US); and Vishnuvyas Sethumadhavan, Mountain View, CA (US)
Assigned to Apixio, LLC, San Mateo, CA (US)
Filed by Apixio, LLC, San Mateo, CA (US)
Filed on May 7, 2020, as Appl. No. 16/869,560.
Claims priority of provisional application 62/846,555, filed on May 10, 2019.
Prior Publication US 2020/0356825 A1, Nov. 12, 2020
Int. Cl. G16H 10/60 (2018.01); G06F 18/21 (2023.01); G06F 18/231 (2023.01); G06F 18/2413 (2023.01); G06V 30/413 (2022.01); G16H 15/00 (2018.01); G16H 40/20 (2018.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01)

CPC G16H 10/60 (2018.01) [G06F 18/2178 (2023.01); G06F 18/231 (2023.01); G06F 18/24147 (2023.01); G06V 30/413 (2022.01); G16H 15/00 (2018.01); G16H 40/20 (2018.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01)]

18 Claims

1. A method for building an electronic medical record (EMR) document analysis model implemented by at least one processor in communication with at least one memory, the method comprising:

collecting EMR documents from a database;

applying a latent Dirichlet allocation topic model to the EMR documents to generate cluster membership data comprising clusters associated with topics of the EMR documents, wherein one or more of the clusters comprises at least one thousand dimensions;

storing the cluster membership data as metadata associated with the EMR documents;

training the EMR document analysis model based on the clusters of the EMR documents and the metadata to produce a trained EMR document analysis model, wherein the clusters are based on the metadata and the topics associated therewith;

applying the trained EMR document analysis model to the EMR documents to produce adjusted cluster membership data for each of the EMR documents, the adjusted cluster membership data comprising a ranking associated with a respective distance of each of the EMR documents from a respective center of the one or more clusters, wherein being within a first distance from the respective center indicates strong correlation to the one or more clusters and a lower ranking, wherein being outside the first distance from the respective center indicates a weak correlation to the one or more clusters and a higher ranking;

updating the metadata based on the adjusted cluster membership;

receiving an input that identifies a condition or diagnosis;

creating labeled sample EMR documents by assigning EMR analysis labels associated with the condition or diagnosis to sample EMR documents, the sample EMR documents comprising EMR documents nearest a membership boundary of one or more of the clusters and outside the first distance from the respective center based on the updated metadata; and

training the trained EMR document analysis model with the labeled sample EMR documents to produce an improved EMR document analysis model with respect to the condition or diagnosis, thereby producing the improved EMR document analysis model based on training the trained EMR document analysis model with the labeled sample EMR documents indicating the weak correlation to the one or more clusters.