CPC G16B 20/20 (2019.02) [G06F 17/18 (2013.01); G16B 5/00 (2019.02); G16B 5/20 (2019.02); G16B 20/00 (2019.02); G16B 40/00 (2019.02); G16B 40/20 (2019.02); G16B 40/30 (2019.02)] | 20 Claims |
1. A computer-implemented method for phasing diploid genotypes, the computer-implemented method comprising:
accessing a set of reference haplotypes corresponding to reference diploid genotypes that have already been phased;
accessing an input sample of a diploid genotype;
iteratively updating, using one or more processors, a set of directed acyclic models based on the diploid genotype and the reference haplotypes, each directed acyclic model corresponding to a different window of single nucleotide polymorphisms (SNPs), at least one of the directed acyclic models comprising a set of nodes, wherein at least one node at a first level is a parent node that represents a particular haplotype sequence at the first level, the parent node having a first edge connected to a first sub node at a second level and a second edge connected to a second sub node at the second level, the first sub node representing a major allele at the second level and the second sub node representing a minor allele at the second level, and wherein iteratively updating the set of directed acyclic models comprises:
in each iteration,
applying the set of directed acyclic models to the input sample of the diploid genotype;
obtaining phasings of the input sample of the diploid genotype, the obtained phasing comprising a set of pairs of haplotypes of the input sample;
selecting, from the set of pairs of haplotypes of the input samples, a subset of pairs of haplotypes of the input samples;
updating the set of reference haplotypes by adding the selected subset of pairs of haplotypes of the input samples; and
updating at least one of the set of directed acyclic models using the updated set of reference haplotypes;
determining phasings of the diploid genotype using the one or more processors executing the set of directed acyclic models, the determining comprising:
applying the set of updated directed acyclic models to the input sample;
receiving output from each of the set of updated directed acyclic models, each output comprising at least a pair of phased haplotypes for the input sample; and
concatenating the received pairs of phased haplotypes to generate a single pair of phased haplotypes for the input sample; and
returning phased haplotypes for the input sample of the diploid genotype.
|