CPC G06F 16/906 (2019.01) [G06F 16/215 (2019.01); G06F 16/2246 (2019.01); G06F 16/258 (2019.01); G06F 16/287 (2019.01); G06F 18/22 (2023.01); G06N 3/045 (2023.01); G06N 20/20 (2019.01)] | 20 Claims |
1. A computer-implemented method comprising:
extracting a first set of features from a first set of tree data and a second set of features from a second set of tree data, wherein the first set of tree data corresponds to a first tree person from a first genealogical tree and the second set of tree data corresponds to a second tree person from a second genealogical tree, and wherein each of the first genealogical tree and the second genealogical tree comprise a plurality of interconnected nodes representing relationships between tree persons;
generating, utilizing a feature comparator to compare the first set of features and the second set of features:
an individual-level similarity vector comprising paired individual-level features from paired tree persons, wherein the paired tree persons include: i) one or more of the first tree person or a first relative related to the first tree person and ii) one or more of the second tree person or a second relative related to the second tree person; and
a family-level similarity score from features across familial relationships of the paired tree persons;
generating, utilizing an individual-level machine learning model to analyze the individual-level similarity vector, an individual-level similarity score defining a similarity between the paired tree persons;
determining, utilizing a family-level machine learning model based on the individual-level similarity score and the family-level similarity score, that the first tree person and the second tree person are duplicates of a single individual; and
modifying a cluster database based on determining that the first tree person and the second tree person are duplicates of the single individual.
|