CPC G16B 30/10 (2019.02) [G16B 30/00 (2019.02); G16B 30/20 (2019.02); G16B 50/00 (2019.02)] | 20 Claims |
1. A method for aligning one or more sequence reads to a genomic reference graph, the one or more sequence reads having been previously obtained from a biological sample from a subject, the method comprising:
using at least one processor to perform:
accessing at least one data structure representing the genomic reference graph, the genomic reference graph representing at least 1,000,000 nucleic acids and comprising nodes and edges connecting the nodes, the nodes including a first node and one or more parent nodes of the first node, the first node representing a first nucleotide sequence stored as a first string of symbols, wherein the at least one data structure stores data specifying the nodes and edges;
aligning the one or more sequence reads to the genomic reference graph using the at least one data structure and a dynamic programming algorithm, the aligning comprising, for each particular sequence read of the one or more sequence reads:
determining scores for entries in a first matrix associated with the first node, the first matrix representing a comparison between the particular sequence read and the first string of symbols, the determining comprising:
determining whether a symbol of the particular sequence read matches a first symbol of the first string of symbols;
accessing a score from one or more matrices associated with the one or more parent nodes of the first node; and
determining a score for an entry in the first matrix based on: (i) a result of determining whether the symbol of the particular sequence read matches the first symbol of the first string of symbols and (ii) the score accessed from the one or more matrices associated with the one or more parent nodes; and
aligning the particular sequence read to the genomic reference graph based on the determined scores; and
generating output indicative of results of aligning the one or more sequence reads to the genomic reference graph.
|