CPC G06F 16/285 (2019.01) [G06F 16/211 (2019.01); G06F 16/215 (2019.01); G06F 16/2272 (2019.01); G06F 16/24556 (2019.01)] | 20 Claims |
1. A method comprising using at least one hardware processor to:
receive data comprising a plurality of firmographic records from a plurality of sources, wherein each of the plurality of firmographic records comprises a plurality of fields;
normalize the plurality of firmographic records into a common schema;
clean the plurality of firmographic records by replacing a value of each of one or more of the plurality of fields in one or more of the plurality of firmographic records with a value of that field in another one of the plurality of firmographic records;
cluster the plurality of firmographic records into a plurality of clusters, wherein each of the plurality of clusters comprises a subset of the plurality of firmographic records;
for each of the plurality of clusters, collapse the subset of firmographic records in that cluster into a single conflated firmographic record based on a voting process within that cluster;
generate a master identifier for each conflated firmographic record; and
merge the conflated firmographic records into a master firmographic database, comprising a plurality of mastered firmographic records, indexed by the master identifiers.
|