US 12,334,191 B2
Haplotype phasing models
Catherine Ann Ball, Mountain View, CA (US); Keith D. Noto, San Francisco, CA (US); Kenneth G. Chahine, Park City, UT (US); Mathew J. Barber, Chicago, IL (US); and Yong Wang, Foster City, CA (US)
Assigned to Ancestry.com DNA, LLC, Lehi, UT (US)
Filed by Ancestry.com DNA, LLC, Lehi, UT (US)
Filed on Apr. 29, 2020, as Appl. No. 16/862,266.
Application 16/862,266 is a continuation of application No. 15/519,099, granted, now 10,679,729, previously published as PCT/US2015/056164, filed on Oct. 19, 2015.
Claims priority of provisional application 62/065,726, filed on Oct. 19, 2014.
Claims priority of provisional application 62/065,557, filed on Oct. 17, 2014.
Prior Publication US 2020/0303035 A1, Sep. 24, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G16B 20/20 (2019.01); G06F 17/18 (2006.01); G16B 5/00 (2019.01); G16B 5/20 (2019.01); G16B 20/00 (2019.01); G16B 40/00 (2019.01); G16B 40/20 (2019.01); G16B 40/30 (2019.01)
CPC G16B 20/20 (2019.02) [G06F 17/18 (2013.01); G16B 5/00 (2019.02); G16B 5/20 (2019.02); G16B 20/00 (2019.02); G16B 40/00 (2019.02); G16B 40/20 (2019.02); G16B 40/30 (2019.02)] 20 Claims
 
1. A computer-implemented method for phasing diploid genotypes, the computer-implemented method comprising:
accessing a set of reference haplotypes corresponding to reference diploid genotypes that have already been phased;
accessing an input sample of a diploid genotype;
iteratively updating, using one or more processors, a set of directed acyclic models based on the diploid genotype and the reference haplotypes, each directed acyclic model corresponding to a different window of single nucleotide polymorphisms (SNPs), at least one of the directed acyclic models comprising a set of nodes, wherein at least one node at a first level is a parent node that represents a particular haplotype sequence at the first level, the parent node having a first edge connected to a first sub node at a second level and a second edge connected to a second sub node at the second level, the first sub node representing a major allele at the second level and the second sub node representing a minor allele at the second level, and wherein iteratively updating the set of directed acyclic models comprises:
in each iteration,
applying the set of directed acyclic models to the input sample of the diploid genotype;
obtaining phasings of the input sample of the diploid genotype, the obtained phasing comprising a set of pairs of haplotypes of the input sample;
selecting, from the set of pairs of haplotypes of the input samples, a subset of pairs of haplotypes of the input samples;
updating the set of reference haplotypes by adding the selected subset of pairs of haplotypes of the input samples; and
updating at least one of the set of directed acyclic models using the updated set of reference haplotypes;
determining phasings of the diploid genotype using the one or more processors executing the set of directed acyclic models, the determining comprising:
applying the set of updated directed acyclic models to the input sample;
receiving output from each of the set of updated directed acyclic models, each output comprising at least a pair of phased haplotypes for the input sample; and
concatenating the received pairs of phased haplotypes to generate a single pair of phased haplotypes for the input sample; and
returning phased haplotypes for the input sample of the diploid genotype.