US 12,067,059 B2
Dynamically generating normalized master data
Nichole Haas, Belleville, WA (US); Anuja Khemka, Bellevue, WA (US); William David Jackson, Bellevue, WA (US); Anikate Singh, Bellevue, WA (US); Samartha Tumkur Vani, Bellevue, WA (US); and Lu Zhang, Bellevue, WA (US)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Oct. 31, 2017, as Appl. No. 15/799,890.
Prior Publication US 2019/0130050 A1, May 2, 2019
Int. Cl. G06F 16/903 (2019.01); G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/2457 (2019.01); G06F 16/27 (2019.01); G06F 17/18 (2006.01); G06N 20/00 (2019.01); G06Q 10/02 (2012.01); G06Q 10/10 (2023.01); G06Q 20/20 (2012.01)
CPC G06F 16/90344 (2019.01) [G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/24578 (2019.01); G06F 16/27 (2019.01); G06F 17/18 (2013.01); G06N 20/00 (2019.01); G06Q 10/02 (2013.01); G06Q 10/10 (2013.01); G06Q 20/20 (2013.01); G06Q 20/202 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving transactional data from a plurality of databases, the transactional data being received in each database from a plurality of sources and comprising a plurality of input records, each input record comprising a string representation of an entity having a particular location and including a plurality of string components, wherein input records from different sources represent particular entities having corresponding locations using a plurality of different string representations;
mapping string components of one or more of the string representations in each input record into one or more corresponding master string components;
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations that are most similar to string representations in the input record; and
for each input record in the plurality of input records, using a machine learning algorithm to select a master record in the plurality of master records that matches the input record based on a training set, the master string representations in the master records, and the string representations in the input record.