US 12,191,001 B2
	Population frequency modeling for quantitative variant pathogenicity estimation
Toby Manders, San Francisco, CA (US); Keith Nykamp, Berkeley, CA (US); Alexandre Colavin, San Diego, CA (US); and Yuya Kobayashi, Menlo Park, CA (US)
Assigned to Laboratory Corporation of America Holdings, Burlington, NC (US)
Appl. No. 18/700,257
Filed by Laboratory Corporation of America Holdings, Burlington, NC (US)
PCT Filed Oct. 31, 2023, PCT No. PCT/US2023/036543 § 371(c)(1), (2) Date Apr. 10, 2024, PCT Pub. No. WO2024/097261, PCT Pub. Date May 10, 2024.
Claims priority of provisional application 63/421,430, filed on Nov. 1, 2022.
Prior Publication US 2024/0339177 A1, Oct. 10, 2024
Int. Cl. G16B 40/00 (2019.01); G16B 20/40 (2019.01)

CPC G16B 40/00 (2019.02) [G16B 20/40 (2019.02)]

48 Claims

1. A method for configuring a machine learning model to model population frequency for variant classification, the method comprising:

applying a logistic regression model to a first set of population data for a first set of genes, wherein an item of the first set of population data comprises, for a variant located at a position within a gene of the first set of genes, a set of features comprising at least one gene-level feature, at least one variant-level feature, and at least one population frequency meta-feature, and a reference label that indicates whether the variant is benign or pathogenic, wherein the at least one population frequency meta-feature quantifies predictive value of allele frequency in the gene, wherein the applying comprises computing a gene-level constraint; including the gene-level constraint in the at least one gene-level feature; computing an allele frequency; including the allele frequency in the at least one variant-level feature; including, in the at least one population frequency meta-feature, a mathematical combination of the gene-level constraint and the allele frequency;

and applying the logistic regression model to the set of features including the mathematical combination of the gene-level constraint and the allele frequency; wherein the trained logistic regression model is capable of outputting variant pathogenicity estimates that satisfy the at least one second performance criterion based on the set of features including the mathematical combination of the gene-level constraint and the allele frequency;

for each item of the first set of population data, evaluating a variant classification prediction output by the logistic regression model based on an expected variant classification indicated by the reference label; and

iteratively adjusting a value of at least one parameter or coefficient of the logistic regression model until output of a loss function computed based on the variant classification prediction output by the logistic regression model satisfies at least one first performance criterion, to produce a trained logistic regression model, wherein the trained logistic regression model is capable of outputting variant pathogenicity estimates that satisfy at least one second performance criterion.