US 12,321,710 B2
Machine learning-based translation of address strings to standardized addresses
Vera Sazonova, Montréal (CA)
Assigned to SafeGraph, Inc., San Francisco, CA (US)
Filed by SafeGraph, Inc., San Francisco, CA (US)
Filed on Nov. 19, 2021, as Appl. No. 17/531,541.
Prior Publication US 2023/0161976 A1, May 25, 2023
Int. Cl. G06F 40/58 (2020.01); G06N 5/025 (2023.01)
CPC G06F 40/58 (2020.01) [G06N 5/025 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A system, comprising:
one or more processors; and
one or more non-transitory computer readable media to store instructions executable by the one or more processors to perform operations comprising:
parsing, using an address parser, a string comprising a set of substrings;
classifying, by a machine learning model comprising a named entity recognition model, the set of substrings into:
a set of address substrings; and
a set of non-address substrings;
mapping a non-address substring from the set of non-address substrings to a non-address component, wherein the non-address substring is excluded from a standardized address based on a jurisdiction-specific template; and
producing, as output, an address component classification for individual substrings in the set of substrings;
wherein the machine learning model further comprises a convolutional neural network configured to encode context-independent vectors into a context-sensitive sentence matrix starting with at least a 128 dimensions for individual words in the set of substrings.