US 12,086,735 B2
Local genetic ethnicity determination system
Keith D. Noto, San Francisco, CA (US); and Yong Wang, Foster City, CA (US)
Assigned to Ancestry.com DNA, LLC, Lehi, UT (US)
Filed by Ancestry.com DNA, LLC, Lehi, UT (US)
Filed on Jan. 8, 2020, as Appl. No. 16/737,269.
Application 16/737,269 is a continuation of application No. 15/209,458, filed on Jul. 13, 2016, granted, now 10,558,930.
Claims priority of provisional application 62/191,968, filed on Jul. 13, 2015.
Prior Publication US 2020/0160202 A1, May 21, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 7/01 (2023.01); G06N 20/00 (2019.01); G16B 10/00 (2019.01); G16B 40/00 (2019.01); G16B 40/20 (2019.01)
CPC G06N 7/01 (2023.01) [G06N 20/00 (2019.01); G16B 10/00 (2019.02); G16B 40/20 (2019.02); G16B 40/00 (2019.02)] 20 Claims
 
1. A computer-implemented method comprising:
accessing an input sample genetic dataset of an individual;
dividing the input sample genetic dataset into a plurality of windows, each window comprising a set of a plurality of single nucleotide polymorphisms (SNPs);
generating, using the divided input sample genetic dataset, an inter-window hidden Markov model (HMM), wherein the inter-window HMM comprises:
(i) for each window, a set of nodes representing the window, each node in the set corresponding to a pair of labels and associated with an emission probability, each label in the pair representing an ethnicity label for the plurality of SNPs included in the window;
(ii) a plurality of edges, each edge connecting a first node of a first set of nodes representing a first window to a second node of a second set of nodes representing a second window, each edge representing a transition from the first node to the second node;
and wherein the inter-window HMM is trained by:
receiving haplotype data corresponding to sequences of alleles of individuals;
building per-window models for the plurality of windows;
receiving a set of reference panel samples; and
training the per-window models using the set of reference panel samples to generate the emission probability for each node of each window in the inter-window HMM; and
assigning one or more ethnicity labels to the input sample genetic dataset using the inter-window HMM.