US 12,436,955 B2
Systems and methods for cohort analysis using compressed data objects enabling fast memory lookups
Nigam H. Shah, Menlo Park, CA (US); Vladimir Polony, Novato, CA (US); Juan Manuel Banda, Mableton, GA (US); and Alison Victoria Callahan, Oakland, CA (US)
Assigned to The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US)
Filed by The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US)
Filed on Jul. 18, 2023, as Appl. No. 18/354,257.
Application 18/354,257 is a continuation of application No. 17/645,569, filed on Dec. 22, 2021, granted, now 11,748,359.
Application 17/645,569 is a continuation of application No. 16/610,440, granted, now 11,210,296, issued on Nov. 1, 2019, previously published as PCT/US2018/030413, filed on May 1, 2018.
Claims priority of provisional application 62/492,779, filed on May 1, 2017.
Prior Publication US 2023/0367775 A1, Nov. 16, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/2455 (2019.01); G06F 16/22 (2019.01); G06F 16/25 (2019.01); G06F 21/62 (2013.01)
CPC G06F 16/24568 (2019.01) [G06F 16/2255 (2019.01); G06F 16/2272 (2019.01); G06F 16/254 (2019.01); G06F 21/6245 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for computer-implemented data analysis, comprising:
receiving a search query to analyze medical data related to a patient, wherein the search query comprises a plurality of parameters related to data components for a plurality of patient demographics;
determining a data object storing a particular data component relevant to the search query, wherein the data object:
corresponds to the patient; and
comprises:
a plurality of data components, including the particular data component, wherein:
each data component of the plurality of data components corresponds to a different type of data value related to the medical data,
medical data for the patient is encoded, within the plurality of data components, in a serialized in-memory byte-stream format,
the medical data is received from a plurality of different medical information sources, and
the data object is stored, in its entirety, at a unique and continuous memory location; and
at least one header providing information regarding memory mappings of the plurality of data components within a body of the data object, the at least one header comprising:
an offset for each of the plurality of data components in the body of the data object, and
encoding information, wherein the encoding information identifies at least one data type used in storing the offset for each of the plurality of data components;
retrieving a particular data value, responding to the search query, directly from the particular data component, wherein:
retrieving the particular data value comprises using the encoding information, the memory mappings and the offset to identify a memory location of the particular data component, and
the particular data value is retrieved in a serialized in-memory byte-stream format, while the data object remains serialized; and
generating an identification of a cohort of patients for the search query, wherein the identification includes the particular data value.