US 12,437,214 B2
Machine-learning system and method for identifying same person in genealogical databases
Atanu Roy, Lehi, UT (US); Jianlong Qi, Lehi, UT (US); Peng Jiang, Lehi, UT (US); Aaron Ling, Lehi, UT (US); Rey Furner, Lehi, UT (US); Lei Wu, Lehi, UT (US); Eugene Greenwood, Lehi, UT (US); and Ian Stiles, Lehi, UT (US)
Assigned to Ancestry.com Operations Inc., Lehi, UT (US)
Filed by Ancestry.com Operations Inc., Lehi, UT (US)
Filed on Aug. 3, 2021, as Appl. No. 17/392,695.
Application 17/392,695 is a continuation of application No. 15/479,291, filed on Apr. 5, 2017, granted, now 11,113,609.
Claims priority of provisional application 62/393,849, filed on Sep. 13, 2016.
Claims priority of provisional application 62/393,276, filed on Sep. 12, 2016.
Claims priority of provisional application 62/319,299, filed on Apr. 7, 2016.
Prior Publication US 2021/0365803 A1, Nov. 25, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 5/02 (2023.01); G06F 16/00 (2019.01); G06N 5/025 (2023.01)
CPC G06N 5/025 (2013.01) [G06F 16/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
identifying a first tree person from a first genealogical tree and a second tree person from a second genealogical tree, wherein both the first genealogical tree and the second genealogical tree comprise a plurality of interconnected tree persons corresponding to individuals that are related to each other;
extracting, from first tree data of the first genealogical tree, a first set of features for the first tree person and, from second tree data of the second genealogical tree, a second set of features for the second tree person;
based on extracting the first set of features for the first tree person and the second set of features for the second tree person, generating a metric function, by comparing like features from the first set of features for the first tree person with corresponding features from the second set of features for the second tree person;
generating a plurality of feature weights for similarity metrics of the metric function using a machine learning model configured to output the plurality of feature weights based on receiving an input comprising the first set of features and the second set of features, wherein the machine learning model is trained by:
providing training data comprising pairs of tree persons to the machine learning model; and
modifying the machine learning model using an error computed based on an output of the machine learning model when provided with the training data;
generating a plurality of weighted similarity metrics by multiplying similarity metrics of the metric function with corresponding feature weights from the plurality of feature weights;
generating a similarity score indicating a likelihood of the first tree person and the second tree person being duplicates by calculating a sum of the plurality of weighted similarity metrics; and
modifying a cluster in a genealogical database based on the likelihood of the first tree person and the second tree person being duplicates.