| CPC G16B 20/30 (2019.02) [A61K 31/7088 (2013.01); A61K 38/465 (2013.01); C12N 9/22 (2013.01); C12N 15/1089 (2013.01); C12N 15/11 (2013.01); G16B 40/20 (2019.02); C12N 2310/20 (2017.05); C12N 2800/80 (2013.01)] | 13 Claims |
|
1. A method of introducing a genetic change into a target genomic location encoding a pathogenic allele to modify the pathogenic allele to become a non-pathogenic allele using a Cas-based double strand break genome editing system, the method comprising:
using a computer hardware processor to perform:
selecting a guide RNA for use in introducing the genetic change into the target genomic location by analyzing inputs indicating a nucleotide sequence of the target genomic location and one or more available cut sites for the Cas-based double strand break genome editing system, the selecting comprising:
(a) determining a microhomology score matrix using a first neural network, the determining comprising:
determining a plurality of pairs of overhang sequences using the inputs;
determining a microhomology length vector and/or a microhomology GC fraction vector using the inputs; and
applying the first neural network to the plurality of pairs of overhang sequences and the microhomology length vector and/or the microhomology GC fraction vector to obtain the microhomology score matrix;
(b) determining a microhomology-independent score matrix using a second neural network, the determining comprising:
determining a deletion length vector using the inputs and the plurality of pairs of overhang sequences; and
applying the second neural network to the deletion length vector to obtain the microhomology-independent score matrix;
(c) determining a probability distribution over 1-bp insertions;
(d) determining, using the microhomology score matrix, the microhomology-independent score matrix and the probability distribution over 1-bp insertion, a probability distribution over indel genotypes and a probability distribution over indel lengths for the nucleotide sequence of the target genomic location and the one or more available cut sites;
(e) determining, using the probability distribution over indel genotypes and the probability distribution over indel lengths, for each guide RNA of a plurality of guide RNAs, a predicted frequency of introducing the genetic change into the target genomic location using the Cas-based double strand break genome editing system and the guide RNA;
(f) selecting, using the predicted frequencies of (e), a guide RNA of the plurality of guide RNAs for use in introducing the genetic change into the target genomic location using the Cas-based double strand break genome editing system; and
introducing the genetic change into the target genomic location using the guide RNA selected at (f) and the Cas-based double strand break genome editing system, wherein the genetic change is a 1 base pair insertion or a 1−60 base pair deletion that modifies the pathogenic allele to become the non-pathogenic allele.
|