US 11,929,145 B2
Methods for non-invasive assessment of genetic alterations
Mostafa Azab, San Diego, CA (US); Michael Sykes, San Diego, CA (US); Youting Sun, San Diego, CA (US); Amin Mazloom, Del Mar, CA (US); Taylor Jensen, San Diego, CA (US); Mathias Ehrich, San Diego, CA (US); and Christopher Ellison, San Diego, CA (US)
Assigned to SEQUENOM, INC, San Diego, CA (US)
Appl. No. 16/477,931
Filed by Sequenom, Inc., San Diego, CA (US)
PCT Filed Jan. 22, 2018, PCT No. PCT/US2018/014726
§ 371(c)(1), (2) Date Jul. 15, 2019,
PCT Pub. No. WO2018/136888, PCT Pub. Date Jul. 26, 2018.
Claims priority of provisional application 62/448,601, filed on Jan. 20, 2017.
Claims priority of provisional application 62/448,600, filed on Jan. 20, 2017.
Prior Publication US 2019/0371429 A1, Dec. 5, 2019
Int. Cl. G16B 20/20 (2019.01); C12Q 1/6816 (2018.01); G16B 20/10 (2019.01); G16B 20/40 (2019.01); G16B 25/00 (2019.01); G16B 30/10 (2019.01)
CPC G16B 20/20 (2019.02) [C12Q 1/6816 (2013.01); G16B 20/10 (2019.02); G16B 25/00 (2019.02); G16B 30/10 (2019.02); G16B 20/40 (2019.02)] 22 Claims
 
1. A method for determining a presence or absence of a genetic alteration for a test subject, comprising:
obtaining circulating cell free nucleic acid from a sample from the test subject;
ligating nucleic acid molecules of the circulating cell free nucleic acid with adapters to generate a plurality of sequence constructs, wherein:
each sequence construct comprises: an adapter ligated to an end of a nucleic acid molecule,
each adapter is a single-stranded non-random oligonucleotide or a double-stranded non-random oligonucleotide, and
each single-stranded non-random oligonucleotide or double-stranded non-random oligonucleotide comprises at least one single molecule barcode (SMB) having a predetermined non-randomly generated molecular barcode sequence of nucleotides;
generating, using a first polymerase chain reaction, library constructs for each sequence construct, wherein each library construct for a given sequence construct comprises a same sequence of nucleotides for at least one SMB and a nucleic acid molecule;
capturing a subset of the library constructs using probe oligonucleotides under hybridization conditions to enrich for one or more genomic regions of interest, wherein the probe oligonucleotides span the one or more genomic regions of interest;
generating, using a second polymerase chain reaction, enriched library constructs for each library construct of the subset of the library constructs, wherein each enriched library construct for a given library construct of the subset of the library constructs comprises a same sequence of nucleotides for at least one SMB and a nucleic acid molecule;
sequencing the enriched library constructs to obtain sequence reads;
generating an alignment computer file comprising on-target sequence reads and associated genomic positioning data, wherein:
the generating the alignment computer file comprises aligning the sequence reads to a reference genome to identify the on-target sequence reads and obtain the genomic positioning data,
the genomic positioning data is informative of a start position and an end position of each on-target sequence read aligned to the reference genome, and
the at least one SMB and the genomic positioning data provide a unique identity to the nucleic acid molecule represented in each of the on-target sequence reads;
generating, by running programming language scripts on the alignment computer file, a duplicate marked alignment computer file comprising an entry for each on-target sequence read, wherein the generating the duplicate marked alignment computer file comprises:
assigning the on-target sequence reads to read groups according to read group signatures, wherein:
each of the read group signatures comprises at least one SMB sequence and genomic positioning data informative of a start position and an end position of a nucleic acid molecule, and
an on-target sequence read is assigned to a read group when the at least one SMB of the on-target sequence read and the associated genomic positioning data are similar to the at least one SMB sequence and the genomic positioning data of a read group signature associated with the read group; and
identifying each of the on-target sequence reads assigned to a same read group as duplicate reads in the duplicate marked alignment computer file by adjusting a flag in the alignment computer file and associating a unique read group numerical identifier with the entry of each of the on-target sequence reads;
generating, using the duplicate marked alignment computer file, a final alignment computer file comprising a consensus sequence for each of the read groups, wherein the consensus sequence for each of the read groups is generated by collapsing the on-target sequence reads assigned to each read group into a consensus sequence based on the flag and the unique read group numerical identifier for each of the on-target sequence reads;
determining the presence or absence of the genetic alteration based on the consensus sequence for each of the read groups in the final alignment computer file; and
outputting a report concerning the presence or absence of the genetic alteration for the test subject, wherein the report comprises (i) the one or more genomic regions of interest, and (ii) a status of the genetic alteration corresponding to the one or more genomic regions of interest.