US 11,748,359 B2
	Systems and methods for cohort analysis using compressed data objects enabling fast memory lookups
Nigam H. Shah, Menlo Park, CA (US); Vladimir Polony, Novato, CA (US); Juan Manuel Banda, Mableton, GA (US); and Alison Victoria Callahan, Oakland, CA (US)
Assigned to The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US)
Filed by The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US)
Filed on Dec. 22, 2021, as Appl. No. 17/645,569.
Application 17/645,569 is a continuation of application No. 16/610,440, granted, now 11,210,296, previously published as PCT/US2018/030413, filed on May 1, 2018.
Claims priority of provisional application 62/492,779, filed on May 1, 2017.
Prior Publication US 2022/0188318 A1, Jun. 16, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/2455 (2019.01); G06F 16/22 (2019.01); G06F 16/25 (2019.01); G06F 21/62 (2013.01)

CPC G06F 16/24568 (2019.01) [G06F 16/2255 (2019.01); G06F 16/2272 (2019.01); G06F 16/254 (2019.01); G06F 21/6245 (2013.01)]

18 Claims

1. A computer-implemented method for data analysis, comprising:

receiving, for each of a plurality of patients, unstructured medical information for the patient from a plurality of different sources of medical information and generating a data object for the patient using a plurality of different models that provide structure for processing the unstructured medical information for the different sources of medical information;

selecting a data type for at least one data object in a plurality of data objects that is optimal for encoding the unstructured information into the at least one data object based on properties of the at least one data object, wherein the at least one data object comprises at least one header and a plurality of data components, wherein the at least one header comprises information regarding the selected data type and memory mappings of the plurality of data components within a body of the at least one data object;

encoding the unstructured information in the at least one data object of the selected data type, wherein the unstructured information is encoded within the plurality of data components in a serialized in-memory byte-stream format;

receiving a search query to analyze patient medical data with a plurality of parameters related to a plurality of data components for a plurality of patient demographics;

determining a particular data component relevant to the search query; and

retrieving a data value directly from the particular data component of the at least one data object using the header of the at least one data object to identify a memory location of the particular data component and without deserialization of the at least one data object, wherein the data value is retrieved in a serialized in-memory byte-stream format; and

generating an identification of a cohort of patients for the search query that includes the data value.