US 12,315,601 B2
Linear genome assembly from three dimensional genome structure
Erez Aiden, Houston, TX (US); Olga Dudchenko, Houston, GA (US); Aviva Aiden, Houston, TX (US); Elena Stamenova, Cambridge, MA (US); Sanjit Singh Batra, Houston, TX (US); Arina Omer, Houston, TX (US); Per Aspera Adastra, Houston, TX (US); Neva Durand, Houston, TX (US); Maxim Massenkoff, Cambridge, MA (US); Sarah Nyquist, Cambridge, MA (US); Anthony Tzen, Houston, TX (US); Christopher Lui, Houston, TX (US); Melanie Pham, Houston, TX (US); and Eric Lander, Cambridge, MA (US)
Assigned to THE BROAD INSTITUTE, INC., Cambridge, MA (US); and BAYLOR COLLEGE OF MEDICINE, Houston, TX (US)
Appl. No. 16/308,386
Filed by THE BROAD INSTITUTE, INC., Cambridge, MA (US); and BAYLOR COLLEGE OF MEDICINE, Houston, TX (US)
PCT Filed Jun. 8, 2017, PCT No. PCT/US2017/036649
§ 371(c)(1), (2) Date Dec. 7, 2018,
PCT Pub. No. WO2017/214461, PCT Pub. Date Dec. 14, 2017.
Claims priority of provisional application 62/347,605, filed on Jun. 8, 2016.
Claims priority of provisional application 62/475,808, filed on Mar. 23, 2017.
Claims priority of provisional application 62/471,777, filed on Mar. 15, 2017.
Claims priority of provisional application 62/374,475, filed on Aug. 12, 2016.
Prior Publication US 2019/0385703 A1, Dec. 19, 2019
Int. Cl. G01N 33/48 (2006.01); C12Q 1/6869 (2018.01); G01N 33/50 (2006.01); G16B 5/10 (2019.01); G16B 15/10 (2019.01); G16B 30/20 (2019.01)
CPC G16B 5/10 (2019.02) [C12Q 1/6869 (2013.01); G16B 15/10 (2019.02); G16B 30/20 (2019.02)] 25 Claims
 
1. A method for assembly of one or more long DNA molecules comprising:
a) performing a DNA proximity ligation assay conducted on one or more samples;
b) generating a draft assembly of contigs and scaffolds from input sequencing reads obtained, at least in part, from the DNA proximity ligation assay conducted on one or more samples;
c) assembling larger sequences corresponding to one or more DNA molecules in the one or more samples by iteratively overlapping, ordering, orienting, and merging the contigs and scaffolds in the draft assembly, wherein assembling larger sequences is determined, at least in part, by application of a greedy algorithm, an optimization algorithm, or a manual annotation algorithm;
d) performing misjoin correction on the scaffolds, wherein the misjoin correction uses contact frequency between sequences in the scaffolds generated from a contact matrix to determine one or more misjoins;
e) generating one or more megascaffolds from the corrected scaffolds, wherein generating one or more megascaffolds comprises using a density graph to construct hemi-scaffolds from the corrected scaffolds and transforming the density graph into a confidence graph, the confidence graph constructs one or more megascaffolds from the hemi-scaffolds; and
f) generating a final assembly from the megascaffolds.