| CPC C12Q 1/6886 (2013.01) [C12Q 1/6869 (2013.01); G16B 20/00 (2019.02); G16H 50/20 (2018.01); C12Q 2600/154 (2013.01)] | 11 Claims |
|
1. A method of identifying differentially methylated regions (DMRs) for use in a methylation assay for determination of the presence and/or absence of a cancer in a human subject, the method comprising:
(a) generating, by using whole genome bisulfite sequencing, sequencing reads of DNA from tissue samples obtained from patients from a patient group known to have the cancer, said reads including reads of a first CpG dinucleotide, wherein the first CpG dinucleotide is identified according to reference genome and at a specific known genomic position in the DNA, wherein the patient group known to have the cancer comprises at least 15 patients;
(b) generating, by a processor of a computing device, a collection of data points (T, M) from the sequencing reads of DNA of the tissue samples obtained from the patients from the patient group known to have the cancer,
wherein each of the data points comprises (i) a total number of sequence reads (T) of the first CpG dinucleotide, and (ii) a number of methylated sequence reads (M) of the cytosine of the first CpG dinucleotide;
(c) calculating, by the processor, a linear slope (βp) of an M vs. T (or T vs. M) plot of the collection of data points (T, M) from the samples obtained from the patient group known to have the cancer,
wherein calculating the linear slope comprises performing, for a first cross-plot, a linear regression to identify a line having a slope (βp) and an intercept, and a standard error (σp), wherein the intercept of the linear regression line is fixed at origin (0,0),
wherein the linear slope is a measure of methylation proportion (βp) for the first CpG dinucleotide;
(d) calculating, by the processor, an angle (θ) between a horizontal line extending from data point (σc, βc) in a second cross-plot and a directional vector connecting the data point (σc, βc) with data point (σp, βp), wherein the data point (σc, βc) corresponds to a standard error and a slope of an M vs. T (or T vs. M) plot of a collection of data points (T, M) for the first CpG dinucleotide from tissue samples obtained from a control group of human subjects, wherein the control group of human subjects comprises at least 15 human subjects;
(e) determining, by the processor, whether the first CpG dinucleotide is a high confidence differentially methylated CpG dinucleotide (DMC) based at least in part on said methylation proportion (βp) and said angle (θ),
(f) performing steps (b)-(e) for a plurality of additional CpG dinucleotides in the genome to identify additional high confidence DMCs, wherein the plurality of additional CpG dinucleotides total at least 25,000 CpGs;
(g) classifying, by the processor, the high confidence DMCs into a plurality of clusters based on, at least, the respective angles (θ);
(h) constructing at least one DMR from one of the plurality of clusters, wherein said DMR comprises at least 3 high confidence DMCs from said cluster and wherein each of the at least 3 high confidence DMCs are within 50 base pairs of another of the at least 3 high confidence DMCs;
(i) bisulfite or enzymatically treating a sample comprising DNA obtained from a human subject having an unknown cancer status;
(j) amplifying the at least one DMR from the treated DNA sample using a pair of primers for each DMR to generate a DNA sequencing library; and
(k) detecting, using a methylation assay, a methylation status of the at least one DMR from the DNA sequencing library.
|