| CPC G06F 7/02 (2013.01) [G06F 16/152 (2019.01); G06F 16/313 (2019.01); G06F 16/90344 (2019.01); G06F 16/2458 (2019.01)] | 20 Claims |

|
1. A computer-implemented method for generating hash values to determine string similarity, the computer-implemented method comprising:
converting a first text string of a first data set into a first set of shingles;
determining a weight associated with each shingle in the first set of shingles, based, at least in part, on a particular record field associated with the shingle;
generating, based on a hash function, a hash value for each shingle in the first set of shingles;
reducing the hash value generated for each shingle in the first set of shingles, based, at least in part, on the weight associated with the shingle;
computing a hash signature value using the reduced hash value;
determining that two data records intersect according to the hash signature value; and
storing information about the intersection over a network, in a record database.
|