US 12,254,959 B2
	Identification, characterization, and quantitation of CRISPR-introduced double-stranded DNA break repairs
Heng Li, Wellesley, MA (US); Gavin Kurgan, Iowa City, IA (US); Matthew McNeill, Iowa City, IA (US); and Yu Wang, North Grafton, MA (US)
Assigned to INTEGRATED DNA TECHNOLOGIES, INC., Coralville, IA (US)
Filed by INTEGRATED DNA TECHNOLOGIES, INC., Coralville, IA (US)
Filed on Jul. 2, 2020, as Appl. No. 16/919,577.
Claims priority of provisional application 62/952,603, filed on Dec. 23, 2019.
Claims priority of provisional application 62/952,598, filed on Dec. 23, 2019.
Claims priority of provisional application 62/870,426, filed on Jul. 3, 2019.
Claims priority of provisional application 62/870,471, filed on Jul. 3, 2019.
Prior Publication US 2021/0002700 A1, Jan. 7, 2021
Int. Cl. G16B 20/30 (2019.01); G16B 20/20 (2019.01); G16B 30/10 (2019.01); G16B 40/20 (2019.01)

CPC G16B 20/30 (2019.02) [G16B 20/20 (2019.02); G16B 30/10 (2019.02); G16B 40/20 (2019.02)]

9 Claims

1. A process for identifying and characterizing double-stranded DNA (dsDNA) break repair sites with improved accuracy, the process comprising:

(a) performing CRISPR-Cas editing in a population of cells or tissue with one or more guide RNAs to produce edited genomic DNA;

(b) extracting the edited genomic DNA from the population of cells or tissue;

(d) performing next generation sequencing of the amplicons enriched for target-site sequences and obtaining genomic sample sequence data enriched for target-site sequences;

subsequently executing on a processor the steps of:

(e) receiving the genomic sample sequence data enriched for target-site sequences comprising a plurality of sequences;

(f) merging the sample genomic sequence data enriched for target-site sequences and outputting merged sequences;

(g) developing predicted target-site sequences for the genome containing predicted dsDNA break repair events when a single-stranded or a double-stranded DNA oligonucleotide donor is provided and outputting the predicted target-site sequences;

(h) binning the merged sequences by their alignment to the genome using a mapper and outputting binned target-read alignments;

(i) re-aligning the binned target-read alignments from step (h) to the target-site sequences from step (g) using an aligner weighed by a Cas-enzyme-specific position-specific full gap open and gap extension multiple bonus scoring matrix to simultaneously and preferentially align multiple editing events within a defined sequence distance window of the predicted dsDNA break repair events for each Cas enzyme and each guide RNA and producing a final alignment;

wherein the multiple bonus scoring matrix uses position-specific gap open and gap extension variable penalty vectors to favor alignment of multiple editing events occurring at or near the predicted dsDNA break repair events, and

the multiple bonus scoring matrix is derived from biological editing data at canonical Cas enzyme cut sites and the position of each guide RNA;

(j) analyzing the final alignment and identifying and quantifying editing events within the defined sequence distance window of the predicted dsDNA break repair events for each Cas enzyme and each guide RNA; and

(k) outputting the final alignment, percent editing, percent insertion, percent deletion, or a combination thereof as tables or graphics and selecting one or more guide RNAs with effective CRISPR-Cas editing based on the quantification data from the final alignment, percent editing, percent insertion, percent deletion, or combination thereof; and

(l) using the one or more selected guide RNAs in further CRISPR-Cas editing experiments.