US 12,406,749 B2
Systems and methods for predicting repair outcomes in genetic engineering
Max Walt Shen, Cambridge, MA (US); Jonathan Yee-Ting Hsu, Cambridge, MA (US); Mandana Arbab, Cambridge, MA (US); David K. Gifford, Cambridge, MA (US); David R. Liu, Cambridge, MA (US); and Richard Irving Sherwood, Cambridge, MA (US)
Assigned to The Broad Institute, Inc., Cambridge, MA (US); Massachusetts Institute of Technology, Cambridge, MA (US); The Brigham and Women's Hospital, Inc., Boston, MA (US); and President and Fellows of Harvard College, Cambridge, MA (US)
Appl. No. 16/772,747
Filed by The Broad Institute, Inc., Cambridge, MA (US); Massachusetts Institute of Technology, Cambridge, MA (US); The Brigham and Women's Hospital, Inc., Boston, MA (US); and President and Fellows of Harvard College, Cambridge, MA (US)
PCT Filed Dec. 15, 2018, PCT No. PCT/US2018/065886
§ 371(c)(1), (2) Date Jun. 12, 2020,
PCT Pub. No. WO2019/118949, PCT Pub. Date Jun. 20, 2019.
Claims priority of provisional application 62/669,771, filed on May 10, 2018.
Claims priority of provisional application 62/599,623, filed on Dec. 15, 2017.
Prior Publication US 2022/0238182 A1, Jul. 28, 2022
Int. Cl. G16B 20/30 (2019.01); A61K 31/7088 (2006.01); A61K 38/46 (2006.01); C12N 9/22 (2006.01); C12N 15/10 (2006.01); C12N 15/11 (2006.01); G16B 40/20 (2019.01)
CPC G16B 20/30 (2019.02) [A61K 31/7088 (2013.01); A61K 38/465 (2013.01); C12N 9/22 (2013.01); C12N 15/1089 (2013.01); C12N 15/11 (2013.01); G16B 40/20 (2019.02); C12N 2310/20 (2017.05); C12N 2800/80 (2013.01)] 13 Claims
 
1. A method of introducing a genetic change into a target genomic location encoding a pathogenic allele to modify the pathogenic allele to become a non-pathogenic allele using a Cas-based double strand break genome editing system, the method comprising:
using a computer hardware processor to perform:
selecting a guide RNA for use in introducing the genetic change into the target genomic location by analyzing inputs indicating a nucleotide sequence of the target genomic location and one or more available cut sites for the Cas-based double strand break genome editing system, the selecting comprising:
(a) determining a microhomology score matrix using a first neural network, the determining comprising:
determining a plurality of pairs of overhang sequences using the inputs;
determining a microhomology length vector and/or a microhomology GC fraction vector using the inputs; and
applying the first neural network to the plurality of pairs of overhang sequences and the microhomology length vector and/or the microhomology GC fraction vector to obtain the microhomology score matrix;
(b) determining a microhomology-independent score matrix using a second neural network, the determining comprising:
determining a deletion length vector using the inputs and the plurality of pairs of overhang sequences; and
applying the second neural network to the deletion length vector to obtain the microhomology-independent score matrix;
(c) determining a probability distribution over 1-bp insertions;
(d) determining, using the microhomology score matrix, the microhomology-independent score matrix and the probability distribution over 1-bp insertion, a probability distribution over indel genotypes and a probability distribution over indel lengths for the nucleotide sequence of the target genomic location and the one or more available cut sites;
(e) determining, using the probability distribution over indel genotypes and the probability distribution over indel lengths, for each guide RNA of a plurality of guide RNAs, a predicted frequency of introducing the genetic change into the target genomic location using the Cas-based double strand break genome editing system and the guide RNA;
(f) selecting, using the predicted frequencies of (e), a guide RNA of the plurality of guide RNAs for use in introducing the genetic change into the target genomic location using the Cas-based double strand break genome editing system; and
introducing the genetic change into the target genomic location using the guide RNA selected at (f) and the Cas-based double strand break genome editing system, wherein the genetic change is a 1 base pair insertion or a 1−60 base pair deletion that modifies the pathogenic allele to become the non-pathogenic allele.