US 11,725,237 B2
Polymorphic gene typing and somatic change detection using sequencing data
Sachet Ashok Shukla, Newton, MA (US); Catherine Ju-Ying Wu, Brookline, MA (US); and Gad Getz, Belmont, MA (US)
Assigned to The Broad Institute Inc., Cambridge, MA (US); Dana-Farber Cancer Institute, Inc., Boston, MA (US); and The General Hospital Corporation, Boston, MA (US)
Appl. No. 15/37,394
Filed by The Broad Institute Inc., Cambridge, MA (US); Dana-Farber Cancer Institute, Inc., Boston, MA (US); and The General Hospital Corporation, Boston, MA (US)
PCT Filed Dec. 5, 2014, PCT No. PCT/US2014/068746
§ 371(c)(1), (2) Date May 18, 2016,
PCT Pub. No. WO2015/085147, PCT Pub. Date Jun. 11, 2015.
Claims priority of provisional application 61/912,305, filed on Dec. 5, 2013.
Prior Publication US 2016/0298185 A1, Oct. 13, 2016
Int. Cl. C12Q 1/6874 (2018.01); G16B 20/00 (2019.01); G16B 30/00 (2019.01); C12Q 1/6827 (2018.01); C12Q 1/6886 (2018.01); G16B 30/10 (2019.01); G16B 20/20 (2019.01); G16B 20/40 (2019.01); C12Q 1/6881 (2018.01)
CPC C12Q 1/6874 (2013.01) [C12Q 1/6827 (2013.01); C12Q 1/6881 (2013.01); C12Q 1/6886 (2013.01); G16B 20/00 (2019.02); G16B 20/20 (2019.02); G16B 20/40 (2019.02); G16B 30/00 (2019.02); G16B 30/10 (2019.02); C12Q 2600/156 (2013.01)] 4 Claims
 
1. A method of treating cancer, comprising:
(i) identifying a polymorphic gene type that encodes a human leukocyte antigen (HLA) protein predicted to bind to a neo-epitope comprising performing a computer-implemented method for genotyping polymorphic genes of a patient to identify the polymorphic gene type, the computer-implemented method comprising:
(a) inputting sequence reads extracted from a target polymorphic gene of the patient into a non-transitory computer-executable storage device having a computer-readable and computer-executable program for gene typing and alignment analysis;
(b) generating alignments of the sequence reads extracted from the target polymorphic gene of the patient to a gene reference sequence set having a plurality of gene reference sequences, each gene reference sequence in the gene reference sequence set corresponding to an allele variant of the target polymorphic gene;
(c) determining a first posterior probability or first posterior probability derived score for each allele variant in the alignments based on the sequence reads aligned to each allele variant;
(d) determining a second posterior probability or posterior probability derived score for each allele variant in the gene reference sequence set, wherein a weighting factor is applied to a score contribution of each aligned sequence read based on whether or not the sequence read was also aligned to a first allele variant with a maximum first posterior probability or posterior probability derived score, wherein the weighting factor is based on the corresponding first posterior probability or posterior probability derived score for each of one or more overlapping sequence reads that aligned with the first allele variant and also aligned with one or more other allele variants in the alignments; wherein the first allele variant and a second allele variant with a maximum second posterior probability or posterior probability derived score indicate the polymorphic gene type, and wherein the weighting factor for a given read mapping to the identified first allele variant and the other allele variant is equal to the contribution of the sequence read to an overall posterior probability or posterior probability derived score of other allele variant (s1) divided by a sum of that contribution and a contribution of the sequence read to an overall posterior probability or posterior probability derived score of the first allele variant (s2), wherein the weighting factor w=s1/(s1+s2), and a new contribution of the sequence read to the overall posterior probability or posterior probability derived score of the other allele variant=w*s1, and
wherein the first and the second posterior probability or posterior probability derived scores are determined based on base quality scores and an insert size probability value for each sequence read in the alignment, and wherein the insert size probability value is based at least in part on an insert size distribution of all of the sequence reads extracted from the target polymorphic gene of the patient, or
wherein the first and second posterior probabilities or posterior probability derived scores are calculated based at least in part on population-based allele probabilities observed in a known population data set;
(ii) predicting a human leukocyte antigen (HLA) encoded by the polymorphic gene type identified in (i) above that binds to the neo-epitope;
(iii) preparing a personalized treatment composition, wherein the personalized treatment composition comprises:
(a) neo-epitopes predicted to bind to a protein encoded by the polymorphic gene type identified in (i) above;
(b) a polynucleotide encoding neo-epitopes predicted to bind to the HLA protein predicted in (ii) above and encoded by the polymorphic gene type identified in (i) above;
(c) antigen presenting cells (APCs) comprising neo-epitopes predicted to bind to the HLA protein encoded by a polymorphic gene type indicated in (ii) above or a polynucleotide encoding neo-epitopes predicted to bind to the HLA protein predicted in (ii) above and encoded by a polymorphic gene type identified in (i) above; or
(d) T cells stimulated with APCs comprising neo-epitopes predicted to bind to the HLA protein predicted in (ii) above and encoded by the polymorphic gene type identified in (i) above or a polynucleotide encoding neo-epitopes predicted to bind to a protein encoded by a polymorphic gene type identified in (i) above, and
(iv) administering an effective amount of the personalized treatment composition to the patient, wherein the neo-epitopes and the identified polymorphic gene type are both present in the patient.