US 12,461,944 B2
	Multi-cluster duplicate record detection
Austin Smith, Wesley Chapel, FL (US); Stephen Wilbourn, Royse City, TX (US); Heath Hafner, Mahomet, IL (US); Peter R. Wenzel, Bloomington, IL (US); and Brian Setzler, Bloomington, IL (US)
Assigned to State Farm Mutual Automobile Insurance Company, Bloomington, IL (US)
Filed by State Farm Mutual Automobile Insurance Company, Bloomington, IL (US)
Filed on Mar. 25, 2024, as Appl. No. 18/615,663.
Prior Publication US 2025/0298812 A1, Sep. 25, 2025
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/2457 (2019.01); G06F 16/28 (2019.01)

CPC G06F 16/285 (2019.01) [G06F 16/215 (2019.01); G06F 16/24575 (2019.01)]

20 Claims

1. A multi-cluster data storage system, comprising:

a first computing cluster of a first datacenter, the first computing cluster comprising a first database instance executing on a first server, the first computing cluster storing a first set of records;

a second computing cluster of a second datacenter separate from the first datacenter, the second computing cluster comprising a second database instance executing on a second server, the second computing cluster storing a second set of records;

a search server executing separate from the first server and the second server, the search server comprising storing a multi-cluster index having a sorted object identifier key; and

a duplicate record detector comprising one or more processors and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

retrieving a first set of records from the first computing cluster;

paring the first set of records, into a first pared subset of records, based at least in part on the multi-cluster index;

retrieving a second set of records from the second computing cluster;

paring the second set of records, into a second pared subset of records, based at least in part on the multi-cluster index; and

determining a duplicate record, based at least in part on comparing the first pared subset of records and the second pared subset of records.