CPC G16B 30/10 (2019.02) [G16B 20/00 (2019.02)] | 15 Claims |
1. A method for measuring nucleic acid sequence abundances comprising:
a. obtaining on a processor first data that indicates a target sequence of nucleic acid bases at a plurality of loci, wherein the target sequence comprises a plurality of bins, each bin comprising thousands to millions of loci;
b. measuring a sample from a subject to obtain thousands of reads of DNA sequences along fragments in the sample, wherein read lengths are hundreds of nucleotides, which reads are input to the processor;
c. determining on the processor second data that indicates alignment with the target sequence of the reads;
d. obtaining on the processor third data that indicates locus dependent observed non-uniform biases in coverage;
e. determining on the processor a raw count of reads that start at each locus;
f. obtaining partition data that indicates, for a first partition, a window comprising a number of bases less than a number of loci in a bin and position relative to a current locus, and a plurality of strata based on a corresponding plurality of different contents of nucleic acid bases in the window;
g. attributing to each locus in the target sequence a stratum of the plurality of strata of the first partition based on the content of nucleic acid bases in the target sequence in the window relative to the locus;
h. determining an expected count of each stratum in the first partition based on the raw counts of each locus belonging to the stratum and based on a total number of loci in the target sequence belonging to the stratum;
i. determining on the processor a copy number of a first bin based on a sum over all loci in the first bin of the expected count of each stratum for the first partition with each expected count weighted by the locus dependent observed non-uniform biases in coverage; and
j. presenting on a display, output data that indicates the copy number of the first bin in the sample wherein the copy number of the first bin relative to a copy number of a different bin is indicative of a condition of interest.
|