| CPC G16B 5/10 (2019.02) [C12Q 1/6869 (2013.01); G16B 15/10 (2019.02); G16B 30/20 (2019.02)] | 25 Claims |
|
1. A method for assembly of one or more long DNA molecules comprising:
a) performing a DNA proximity ligation assay conducted on one or more samples;
b) generating a draft assembly of contigs and scaffolds from input sequencing reads obtained, at least in part, from the DNA proximity ligation assay conducted on one or more samples;
c) assembling larger sequences corresponding to one or more DNA molecules in the one or more samples by iteratively overlapping, ordering, orienting, and merging the contigs and scaffolds in the draft assembly, wherein assembling larger sequences is determined, at least in part, by application of a greedy algorithm, an optimization algorithm, or a manual annotation algorithm;
d) performing misjoin correction on the scaffolds, wherein the misjoin correction uses contact frequency between sequences in the scaffolds generated from a contact matrix to determine one or more misjoins;
e) generating one or more megascaffolds from the corrected scaffolds, wherein generating one or more megascaffolds comprises using a density graph to construct hemi-scaffolds from the corrected scaffolds and transforming the density graph into a confidence graph, the confidence graph constructs one or more megascaffolds from the hemi-scaffolds; and
f) generating a final assembly from the megascaffolds.
|