US 10,892,036 B1
Systems and methods for determining the identity of alleles from genomic sequencing data
Mauricio Carneiro, Mountain View, CA (US); Mark DePristo, Palo Alto, CA (US); and Ryan Poplin, Sunnyvale, CA (US)
Assigned to Verily Life Sciences LLC, South San Francisco, CA (US)
Filed by Verily Life Sciences LLC, Mountain View, CA (US)
Filed on Aug. 2, 2017, as Appl. No. 15/666,924.
Claims priority of provisional application 62/369,809, filed on Aug. 2, 2016.
Int. Cl. G01N 33/48 (2006.01); G01N 33/50 (2006.01); G16B 30/00 (2019.01)
CPC G16B 30/00 (2019.02) 13 Claims
 
1. A method for identifying an allele within a genomic sample, the method comprising:
obtaining, by a computing device, a plurality of paired-end fragments from a genomic sample;
extracting, by the computing device, a group of nucleotide substrings from each pair-end fragment;
comparing, by the computing device, each nucleotide substring within the group of nucleotide substrings to reference nucleotide substrings within an index that provides a mapping between each reference nucleotide substring and an allele that contains the reference nucleotide substring;
identifying, by the computing device, for each nucleotide substring, a subset of alleles that contain the nucleotide substring based on the comparing and the mapping;
identifying, by the computing device, alleles that are present in two or more of the identified subsets of alleles;
determining, by the computing device, for each identified allele, a probability that the genomic sample comprises the allele based on a number of the plurality of paired-end fragments that include a nucleotide substring from the allele; and
identifying, by the computing device, an allele is within the genomic sample when the allele has a greatest determined probability compared to that of the determined probabilities for the identified alleles.