US 11,697,835 B2
Systems and methods for epigenetic analysis
Devin Locke, Medford, MA (US); and Wan-Ping Lee, Somerville, MA (US)
Assigned to Seven Bridges Genomics Inc., Charlestown, MA (US)
Filed by Seven Bridges Genomics Inc., Charlestown, MA (US)
Filed on Sep. 16, 2020, as Appl. No. 17/23,289.
Application 17/023,289 is a continuation of application No. 15/007,874, filed on Jan. 27, 2016, granted, now 10,793,895.
Claims priority of provisional application 62/209,058, filed on Aug. 24, 2015.
Prior Publication US 2020/0407778 A1, Dec. 31, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G01N 33/48 (2006.01); C12Q 1/6806 (2018.01); C12Q 1/6874 (2018.01); G16B 30/00 (2019.01); C12Q 1/6869 (2018.01); G16B 30/10 (2019.01)
CPC C12Q 1/6806 (2013.01) [C12Q 1/6869 (2013.01); C12Q 1/6874 (2013.01); G16B 30/00 (2019.02); G16B 30/10 (2019.02)] 19 Claims
 
1. A method for determining genomic modifications in a genome of a subject, the method comprising:
using at least one processor to perform:
obtaining a first sequence of nucleotide bases generated by sequencing nucleic acid from the subject;
creating, in at least one non-transitory storage medium, a directed acyclic graph (DAG) representing the first sequence of nucleotide bases, the DAG comprising nodes and edges connecting the nodes, the nodes including a first node representing a cytosine base in the first sequence at a position, a second node representing a thymine base not in the first sequence at the position, and a third node, wherein:
the first node is stored as a first object in the at least one non-transitory storage medium, the first object comprising a first symbol string representing the cytosine base,
the second node is stored as a second object in the at least one non-transitory storage medium, the second object comprising a second symbol string representing the thymine base,
the third node is stored as a third object in the at least one non-transitory storage medium, the third object comprising a third symbol string representing at least a part of the first sequence, and
the first object further comprises a first list of one or more pointers to one or more objects in the at least one non-transitory storage medium, the first list of one or more pointers being stored in the at least one non-transitory storage medium and including a pointer to the third object;
obtaining a second sequence of nucleotide bases generated by sequencing bisulfite-treated nucleic acid from the subject;
aligning the second sequence to the DAG to produce an alignment, at least in part by determining alignment scores between the second sequence and symbol strings associated with at least some of the nodes of the DAG, wherein determining an alignment score for the third node comprises, for a first symbol in the third symbol string, determining the alignment score for the third node based on an alignment score for a preceding node of the DAG;
identifying, based on the alignment, a corresponding cytosine base in the second sequence at the position that matches the cytosine base observed in the first sequence at the position; and
generating a report that identifies a methylated base in the genome of the subject at the position.