US 11,929,149 B2
Systems and methods for genomic analysis
Alejandro Quiroz Zarate, Cambridge, MA (US); Roberto Olivares-Amaya, Somerville, MA (US); Thomas James Watson, Jr., Auburndale, MA (US); Helen Cecile Van Aggelen, Somerville, MA (US); Eduardo Coronado Sroka, Boston, MA (US); Carlos Antonio Angulo Sermeno, San Luis Potosi (MX); Fernando Fimbres Jurado, Leon (MX); Abraham Solis Garcia-Inda, Irapuato (MX); Fernando Fontove Herrera, Iraputo (MX); and Pablo G. Coste, Newton, MA (US)
Assigned to ARC BIO, LLC, Cambridge, MA (US)
Appl. No. 15/750,350
Filed by ARC BIO, LLC, Cambridge, MA (US)
PCT Filed Aug. 4, 2016, PCT No. PCT/US2016/045564
§ 371(c)(1), (2) Date Feb. 5, 2018,
PCT Pub. No. WO2017/024138, PCT Pub. Date Feb. 9, 2017.
Claims priority of provisional application 62/201,923, filed on Aug. 6, 2015.
Prior Publication US 2020/0090786 A1, Mar. 19, 2020
Int. Cl. G16B 30/10 (2019.01); G16B 5/00 (2019.01); G16B 25/10 (2019.01); G16B 50/30 (2019.01); G16B 45/00 (2019.01); G16B 40/10 (2019.01); G16B 30/20 (2019.01); G16B 40/20 (2019.01); C12Q 1/6869 (2018.01)
CPC G16B 30/10 (2019.02) [C12Q 1/6869 (2013.01); G16B 5/00 (2019.02); G16B 25/10 (2019.02); G16B 30/20 (2019.02); G16B 40/10 (2019.02); G16B 40/20 (2019.02); G16B 45/00 (2019.02); G16B 50/30 (2019.02)] 22 Claims
 
1. A method to identify novel variants, the method comprising:
(a) obtaining, using a hardware processor, a plurality of sequence reads;
(b) aligning, using the hardware processor, the plurality of sequence reads against a graph reference comprising a plurality of alternate paths that each represents a known variant, wherein the aligning comprises:
generating, using the hardware processor, a k-mer prof ile f rom each of the plurality of sequence reads; and
querying, using the hardware processor, each k-mer profile to an index of k-mer profiles generated from the graph reference,
wherein each k-mer in the index of k-mer profiles that corresponds to at least one alternate path is stored with a unique pointer for each of the at least one alternate paths that points to a data structure associated with that alternate path, the data structure including a sequence associated with the alternate path,
and a start offset and an end offset indicative of a location of the alternate path with respect to the graph reference; and
(c) using a subset of the plurality of sequence reads which abnormally align against one or more of the plurality of alternate paths to identify novel variants.