US 11,789,906 B2
	Systems and methods for genomic manipulations and analysis
Roberto Olivares-Amaya, Somerville, MA (US); David Andrew Sinclair, Chestnut Hill, MA (US); Alejandro Quiroz-Zarate, Cambridge, MA (US); and Thomas J. Watson, Jr., Newton, MA (US)
Assigned to ARC BIO, LLC, Cambridge, MA (US)
Appl. No. 15/524,319
Filed by ARC BIO, LLC, Cambridge, MA (US)
PCT Filed Nov. 19, 2015, PCT No. PCT/US2015/061548 § 371(c)(1), (2) Date May 4, 2017, PCT Pub. No. WO2016/081712, PCT Pub. Date May 26, 2016.
Claims priority of provisional application 62/081,931, filed on Nov. 19, 2014.
Prior Publication US 2017/0357665 A1, Dec. 14, 2017
Int. Cl. G06F 16/174 (2019.01); G16B 50/00 (2019.01); G16B 50/50 (2019.01)

CPC G06F 16/1744 (2019.01) [G16B 50/00 (2019.02); G16B 50/50 (2019.02)]

25 Claims

1. A method for preparing and analyzing DNA comprising:

a. obtaining from a subject a sample, wherein the sample comprises DNA;

b. preparing the sample for DNA sequencing to obtain a prepared sample;

c. performing a sequencing reaction on the prepared sample using a sequencer to obtain raw sequence information comprising a plurality of primary characters each corresponding to one of a plurality of molecular units, wherein the raw sequence information includes at least 100 primary characters;

d. transmitting the raw sequence information to a computing device;

e. identifying a position in the raw sequence information corresponding to a secondary character representing a molecular unit that is not positively identified as corresponding to one of the plurality of primary characters;

f. removing the secondary character from the raw sequence information;

g. encoding the position of the secondary character as first position information;

h. transforming the raw sequence information, excluding at least the secondary character, into a compressed data set using a fixed encoding scheme;

i. transferring the compressed data set and the first position information to a computer memory, wherein the compressed data set and first position information utilizes less than 80% of the memory required for the raw sequence information before compression; and

j. accessing the transferred compressed data set and the first position information from the computer memory using with a graphical user interface (GUI) the computer memory to retrieve and visualize genomic information.