| CPC G16B 40/20 (2019.02) [C12Q 1/6886 (2013.01); G16B 20/20 (2019.02); C12Q 1/686 (2013.01); C12Q 1/6874 (2013.01); C12Q 2600/156 (2013.01)] | 14 Claims |
|
1. A method of determining tumor purity of a biological sample of a subject for informing a cancer feature and evaluating a treatment efficacy for the subject, the method comprising:
obtaining nucleic acid sequence data from one or more sequencers that represent a plurality of nucleic acid molecules of the biological sample of the subject;
aligning the nucleic acid sequence data to a reference genome;
identifying, based on the aligned nucleic acid sequence data, a set of genomic regions, wherein each genomic region of the set of genomic regions includes one or more nucleotide-sequence variants relative to a corresponding genomic region of the reference genome;
determining a B-allele frequency for each genomic region of the set of genomic regions;
determining, based on the B-allele frequencies of the set of genomic regions, a B-allele frequency distribution for the biological sample;
processing the B-allele frequency distribution using a trained machine-learning model to estimate a probability of a true tumor purity as a function of a predicted tumor purity in the biological sample, wherein the trained machine-learning model is trained on a training dataset generated from nucleic acid sequence data derived from one or more tumor cells diluted into normal cells; and
generating a report to inform the cancer feature and evaluate the treatment efficacy for the subject based on the estimated probability of a true tumor purity as a function of a predicted tumor purity in the biological sample.
|