US 12,387,816 B2
	Processes for genetic and clinical data evaluation and classification of complex human traits
Jingjing Li, Stanford, CA (US); Sai Zhang, Palo Alto, CA (US); Michael P. Snyder, Stanford, CA (US); Cuiping Pan, Palo Alto, CA (US); and Philip S. Tsao, Palo Alto, CA (US)
Assigned to The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US); and The United States Government as represented by the Department of Veterans Affairs, Washington, DC (US)
Appl. No. 16/961,120
Filed by The Board of Trustees of the Leland Stanford Junior University, Stanford, CA (US); and The United States Government as represented by the Department of Veterans Affairs, Washington, DC (US)
PCT Filed Jan. 9, 2019, PCT No. PCT/US2019/012848 § 371(c)(1), (2) Date Jul. 9, 2020, PCT Pub. No. WO2019/139950, PCT Pub. Date Jul. 18, 2019.
Claims priority of provisional application 62/727,260, filed on Sep. 5, 2018.
Claims priority of provisional application 62/615,304, filed on Jan. 9, 2018.
Prior Publication US 2021/0158894 A1, May 27, 2021
Int. Cl. G16B 20/20 (2019.01); G06F 17/18 (2006.01); G16B 40/20 (2019.01)

CPC G16B 20/20 (2019.02) [G06F 17/18 (2013.01); G16B 40/20 (2019.02)]

19 Claims

1. A method of performing targeted sequencing of an individual's genetic material to determine if the individual has a propensity for a complex disorder, comprising:

obtaining a first variant profile and a second variant profile, wherein the first variant profile is derived from sequencing data of a first cohort of patients having a complex disorder and the second variant profile is derived from sequencing data from a second cohort of patients not having the complex disorder;

assigning each variant within the first variant profile and each variant within the second variant profile a predicted deleteriousness effect;

determining a burden of each variant within the first variant profile and each variant within the second variant profile based on the predicted deleteriousness effect and its frequency within its cohort;

for each variant profile, determining an aggregated variant burden for each gene of a human genome, wherein the aggregated variant burden is an aggregation of burdens of the variants in the gene;

training a classification model with the first variant profile and the second variant profile to distinguish individuals having a complex disorder from individuals not having the complex disorder utilizing an aggregated variant burden for each gene in a subset of genes of the set of genes as features, wherein each gene of the subset of genes has an elevated aggregated variant burden within individuals having the complex disorder, wherein the subset of genes is a minimum set of genes learned by the classification model to solve a logistic regression problem having an optimization objective that is an average cross-entropy of the subset of genes;

identifying the subset of genes, using the trained classification model, that are able to distinguish individuals having a complex disorder from individuals not having the complex disorder based upon the aggregated variant burden the subset of genes;

synthesizing a set of nucleic acid oligomers consisting of sequences that hybridize to sequences of the subset of genes;

obtaining genetic material of an individual;

performing capture hybridization or amplification targeting regions of the genetic material to prepare a sequencing library utilizing the set of nucleic acid oligomers, wherein the regions consist of loci of the identified subset of genes; and

performing targeted sequencing utilizing the sequencing library to yield a targeted sequencing result of genes identified to be burdened within the complex disorder.