US 11,789,906 B2
Systems and methods for genomic manipulations and analysis
Roberto Olivares-Amaya, Somerville, MA (US); David Andrew Sinclair, Chestnut Hill, MA (US); Alejandro Quiroz-Zarate, Cambridge, MA (US); and Thomas J. Watson, Jr., Newton, MA (US)
Assigned to ARC BIO, LLC, Cambridge, MA (US)
Appl. No. 15/524,319
Filed by ARC BIO, LLC, Cambridge, MA (US)
PCT Filed Nov. 19, 2015, PCT No. PCT/US2015/061548
§ 371(c)(1), (2) Date May 4, 2017,
PCT Pub. No. WO2016/081712, PCT Pub. Date May 26, 2016.
Claims priority of provisional application 62/081,931, filed on Nov. 19, 2014.
Prior Publication US 2017/0357665 A1, Dec. 14, 2017
Int. Cl. G06F 16/174 (2019.01); G16B 50/00 (2019.01); G16B 50/50 (2019.01)
CPC G06F 16/1744 (2019.01) [G16B 50/00 (2019.02); G16B 50/50 (2019.02)] 25 Claims
 
1. A method for preparing and analyzing DNA comprising:
a. obtaining from a subject a sample, wherein the sample comprises DNA;
b. preparing the sample for DNA sequencing to obtain a prepared sample;
c. performing a sequencing reaction on the prepared sample using a sequencer to obtain raw sequence information comprising a plurality of primary characters each corresponding to one of a plurality of molecular units, wherein the raw sequence information includes at least 100 primary characters;
d. transmitting the raw sequence information to a computing device;
e. identifying a position in the raw sequence information corresponding to a secondary character representing a molecular unit that is not positively identified as corresponding to one of the plurality of primary characters;
f. removing the secondary character from the raw sequence information;
g. encoding the position of the secondary character as first position information;
h. transforming the raw sequence information, excluding at least the secondary character, into a compressed data set using a fixed encoding scheme;
i. transferring the compressed data set and the first position information to a computer memory, wherein the compressed data set and first position information utilizes less than 80% of the memory required for the raw sequence information before compression; and
j. accessing the transferred compressed data set and the first position information from the computer memory using with a graphical user interface (GUI) the computer memory to retrieve and visualize genomic information.