US 11,942,189 B2
Drug efficacy prediction for treatment of genetic disease
Seyed Ali Kazemi Oskooei, Zurich (CH); Maria Rodriguez Martinez, Thalwil (CH); and Matteo Manica, Zurich (CH)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jan. 16, 2019, as Appl. No. 16/249,110.
Prior Publication US 2020/0227134 A1, Jul. 16, 2020
Int. Cl. G16H 50/50 (2018.01); G06N 20/00 (2019.01); G16B 5/20 (2019.01); G16B 25/10 (2019.01); G16H 20/10 (2018.01)
CPC G16B 5/20 (2019.02) [G06N 20/00 (2019.01); G16B 25/10 (2019.02); G16H 20/10 (2018.01); G16H 50/50 (2018.01)] 19 Claims
OG exemplary drawing
 
1. A computer-implemented method, the method comprising:
determining relevance of a gene to efficacy of a particular pharmaceutical drug based on prior knowledge indicating known or predicted sensitivity of genetic characteristics to action of the particular pharmaceutical drug;
storing, based on a dataset correlating data for disease-cell samples with drug efficacy values for the disease-cell samples, bias weights corresponding to respective genes in samples, the bias weights being dependent on relevance of respective genes to drug efficacy for the respective genes and being determined prior to generating a machine learning model, wherein bias weights assigned to genes relevant to drug efficacy are assigned higher values than what are assigned as values for bias weights assigned to genes that are non-relevant to drug efficacy;
generating the machine learning model for drug efficacy prediction in treatment of genetic disease from the dataset, the generating the machine learning model comprising:
processing said dataset via a tree ensemble method wherein decision trees are grown with splits corresponding to respective genes in said samples, the genes for the splits being chosen from respective subsets of said genes and based on respective selection probabilities dependent on corresponding bias weights and the bias weights are translated during the processing into the selection probabilities for generating the splits; and
storing said machine learning model for prediction of drug efficacy values based on gene expression data of patients.