CPC G16B 30/10 (2019.02) [C12Q 1/6869 (2013.01); G16B 5/00 (2019.02); G16B 25/10 (2019.02); G16B 30/20 (2019.02); G16B 40/10 (2019.02); G16B 40/20 (2019.02); G16B 45/00 (2019.02); G16B 50/30 (2019.02)] | 22 Claims |
1. A method to identify novel variants, the method comprising:
(a) obtaining, using a hardware processor, a plurality of sequence reads;
(b) aligning, using the hardware processor, the plurality of sequence reads against a graph reference comprising a plurality of alternate paths that each represents a known variant, wherein the aligning comprises:
generating, using the hardware processor, a k-mer prof ile f rom each of the plurality of sequence reads; and
querying, using the hardware processor, each k-mer profile to an index of k-mer profiles generated from the graph reference,
wherein each k-mer in the index of k-mer profiles that corresponds to at least one alternate path is stored with a unique pointer for each of the at least one alternate paths that points to a data structure associated with that alternate path, the data structure including a sequence associated with the alternate path,
and a start offset and an end offset indicative of a location of the alternate path with respect to the graph reference; and
(c) using a subset of the plurality of sequence reads which abnormally align against one or more of the plurality of alternate paths to identify novel variants.
|