US 12,431,217 B2
Systems and methods for use of known alleles in read mapping
Deniz Kural, Charlestown, MA (US)
Assigned to Seven Bridges Genomics Inc., Charlestown, MA (US)
Filed by Seven Bridges Genomics Inc., Charlestown, MA (US)
Filed on Nov. 11, 2020, as Appl. No. 17/095,206.
Application 17/095,206 is a continuation of application No. 14/592,444, filed on Jan. 8, 2015, granted, now 10,867,693.
Claims priority of provisional application 61/925,892, filed on Jan. 10, 2014.
Prior Publication US 2021/0265012 A1, Aug. 26, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G16B 30/10 (2019.01); G16B 20/20 (2019.01); G16B 30/00 (2019.01); G16B 30/20 (2019.01)
CPC G16B 30/10 (2019.02) [G16B 30/00 (2019.02); G16B 20/20 (2019.02); G16B 30/20 (2019.02)] 19 Claims
 
1. A method for determining a genomic sequence, the method comprising:
receiving, at a computer system, genetic information identifying one or more single nucleotide polymorphisms (SNPs) at one or more respective positions in a genome of a subject;
receiving a plurality of genomic sequences as a genomic directed acyclic graph (DAG) that represents at least two alternative sequences per position at multiple positions and comprises a list of non-compossible node pairs, wherein at least one path through the genomic DAG corresponds to at least one substantially entire sequence of at least one human chromosome, the genomic DAG stored in the computer system and comprising a plurality of nodes and edges, the nodes representing nucleotide sequences, and the edges connecting pairs of the nodes, wherein each of the one or more SNPs is represented by at least one node in the genomic DAG, wherein the plurality of nodes and edges is stored as objects in a memory of the computer system and an object stores a list of pointers specifying one or more locations in the memory where one or more adjacent objects are stored;
identifying a plurality of candidate paths through the genomic DAG that include the one or more SNPs by identifying a node in the list of non-compossible node pairs using one of the one or more SNPs, identifying a second node paired to the identified node in the list of non-compossible node pairs, and excluding paths containing the second node;
receiving sequence reads obtained by sequencing a biological sample from the subject; and
mapping the sequence reads to the plurality of candidate paths, including one or more of the at least one path through the genomic DAG corresponding to the at least one substantially entire sequence of at least one human chromosome, to identify a nucleotide sequence of at least a portion of the genome.