US 12,111,852 B2
Aggregation of noisy datasets into master firmographic database
Tai Vo, Orange, CA (US); Nitin Vijayvargiya, San Francisco, CA (US); Daniel Hsiung, San Jose, CA (US); Premal Shah, Union City, CA (US); Viral Bajaria, San Francisco, CA (US); and Akshara Palakodety, Mountain View, CA (US)
Assigned to 6SENSE INSIGHTS, INC., San Francisco, CA (US)
Filed by 6SENSE INSIGHTS, INC., San Francisco, CA (US)
Filed on Jul. 25, 2023, as Appl. No. 18/225,863.
Application 18/225,863 is a continuation of application No. 17/362,843, filed on Jun. 29, 2021, granted, now 11,755,625.
Claims priority of provisional application 63/045,707, filed on Jun. 29, 2020.
Prior Publication US 2023/0394070 A1, Dec. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/28 (2019.01); G06F 16/21 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/211 (2019.01); G06F 16/215 (2019.01); G06F 16/2272 (2019.01); G06F 16/24556 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising using at least one hardware processor to:
receive data comprising a plurality of firmographic records from a plurality of sources, wherein each of the plurality of firmographic records comprises a plurality of fields;
normalize the plurality of firmographic records into a common schema;
clean the plurality of firmographic records by replacing a value of each of one or more of the plurality of fields in one or more of the plurality of firmographic records with a value of that field in another one of the plurality of firmographic records;
cluster the plurality of firmographic records into a plurality of clusters, wherein each of the plurality of clusters comprises a subset of the plurality of firmographic records;
for each of the plurality of clusters, collapse the subset of firmographic records in that cluster into a single conflated firmographic record based on a voting process within that cluster;
generate a master identifier for each conflated firmographic record; and
merge the conflated firmographic records into a master firmographic database, comprising a plurality of mastered firmographic records, indexed by the master identifiers.