CPC C12Q 1/6869 (2013.01) [C12Q 1/6844 (2013.01)] | 18 Claims |
1. A method of analyzing a single mixed contributor sample containing DNA where the number of contributors is unknown prior to the analysis of the single mixed contributor sample,
the single mixed contributor sample comprising loci within the DNA,
the method comprising:
(i) processing the single mixed contributor sample into a plurality of liquid droplets, wherein each of the plurality of liquid droplets contains no more than one locus-containing DNA molecule per targeted locus,
wherein isolating the target DNA molecules with the targeted loci in the droplets prevents extended primers from switching between multiple target DNA molecules with different sequences during PCR, thereby preventing creation of chimeric alleles comprising error-containing content due to recombination of genomic content from a plurality of target DNA molecules;
(ii) introducing a plurality of sets of DNA primers into the plurality of liquid droplets, wherein each of the plurality of sets is configured to amplify a different specific locus on a genome, and wherein each of the plurality of sets includes one or more unique molecular tags, and wherein the unique molecular tag(s) of a primer set in one droplet differs from the unique molecular tag(s) of the corresponding primer set in a different droplet;
(iii) subjecting the target DNA molecules in the plurality of liquid droplets to an amplification process comprising a PCR amplification process in the presence of the plurality of sets of DNA primers to provide a plurality of amplicons of targeted DNA sequences at a plurality of pre-determined loci including a first locus,
each of the sets of DNA primers being configured to incorporate into the respective plurality of amplicons and thereby append a tag to the plurality of respective amplicons to produce a plurality of sets of uniquely tagged amplicons derived from each of the plurality of targeted DNA sequences,
wherein each set of the plurality of sets of uniquely tagged amplicons comprises a respective allelic profile including a first allelic profile comprising error-containing amplicons having DNA sequence errors generated by replication errors of the amplification process and error-free amplicons having no DNA sequence errors, the error-containing amplicons and the error-free amplicons having a same tag;
(iv) sequencing each of the uniquely tagged amplicons to provide at least a first group of sequences having an identical first tag associated with the first locus and a second group of sequences having an identical second tag associated with the first locus; and
(v) selecting a first representative DNA sequence from the first group of sequences as representing a first target DNA molecule from the one or more target DNA sequences and selecting a second representative DNA sequence from the second group of sequences as representing a second target DNA molecule;
wherein the first representative DNA sequence is a majority sequence from the first group of sequences and the second representative DNA sequence is a majority sequence from the second group of sequences; and
(vi) associating the first representative DNA sequence with a first contributor genotype at the first locus and the second representative DNA sequence with a second contributor genotype at the first locus via performing an Evidence Ratio (ER) analysis derived from Akaike's Information Criterion (AIC) for the first allelic profile that includes the first group of sequences having an identical first tag and the second group of sequences having the identical second tag, the number of DNA contributors at the locus, c, to be inferenced based on the weight of the evidence; wherein the ER analysis derived from the AIC computed for each possible model of the allelic data with ‘n’ contributors determines the strength of evidence for each possible number of contributors, n, and comprises the following:
wherein ‘n’ represents the number of contributors;
wm′* represents each model's Akaike weight as defined in terms of ratios of the model's likelihood (Lm) given the allelic data, where Lm is proportional to exp(−1/2Δm), wherein Δm is the difference of the AIC value between model ‘m’ and a model ‘mmin’, in which mmin has the lowest AIC value and is not zero;
wherein the upper sum in ERcn is over a subset r(n) of all possible models with n contributors, and the lower sum is over all possible models with a different number of contributors n′,
where n′ is not equal to n, and nmax is the maximum number of contributors considered, with the number of contributors inferenced given by m such that the ERcn value is the maximum over all nmax considered.
|