US 12,019,596 B2
System and method for enriching and normalizing data
Niels Hanson, Seattle, WA (US); James Johnson Gardner, Rolling Hills Estates, CA (US); Punit S. Orpe, Mahwah, NJ (US); Wendy Du, Anaheim, CA (US); Laurence Anthony Brown, Dallas, TX (US); Ranjan Vivek Mannige, Atlanta, GA (US); David Green, Evanston, IL (US); Michael Ahn, Burke, VA (US); Yang Zhou, Dallas, TX (US); Andrew Yuan, New York, NY (US); Adam Helio Rosa, Longmont, CO (US); Kyle B. Chen, Chicago, IL (US); Alex Perusse, Seattle, WA (US); Christian Alexander Manaog, Rego Park, NY (US); Yeshwanth Somu, Arlington, VA (US); Xin Cheng, Seattle, WA (US); Torey C. Bearly, Renton, WA (US); Raghav Saboo, New York, NY (US); Sphoorthy Pamaraju, Secaucus, NJ (US); Erik Ernst, Denver, CO (US); Can Ozuretmen, Atlanta, GA (US); and Yuan Zhang, Cincinnati, OH (US)
Assigned to KPMG LLP, New York, NY (US)
Filed by KPMG LLP, New York, NY (US)
Filed on Jan. 13, 2023, as Appl. No. 18/097,053.
Application 18/097,053 is a continuation of application No. 17/675,192, filed on Feb. 18, 2022, granted, now 11,556,510.
Prior Publication US 2023/0267105 A1, Aug. 24, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/25 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2379 (2019.01); G06F 16/254 (2019.01)] 21 Claims
OG exemplary drawing
 
1. A data aggregation and normalization system for enriching and normalizing data, comprising
a plurality of data sources for providing data that is generated by a plurality of different types of data systems that are managed by different types of software applications,
a data extraction unit for extracting selected portions of the data from the plurality of data sources to form extracted data,
a data storage unit for storing the extracted data,
a data preprocessing and enrichment unit for processing and enriching the extracted data to form cleaned data that is stored in the data storage unit, wherein the data preprocessing and enrichment unit includes
a data cleaning unit for cleaning the extracted unit to form cleaned data,
a common data model unit for inserting the cleaned data into a common data model to normalize the cleaned data, and
an assessment unit for assessing a quality of the cleaned data in the common data model, and
a machine language module having a plurality of predefined machine learning units for applying one or more selected machine learning techniques to selected portions of the cleaned data to form machine language data,
wherein the cleaned data includes transaction data, product data, and user data, wherein the machine language module further comprises a prediction unit for processing the transaction data and the user data and generating a prediction based on an interest in one or more selected products of a selected user, wherein the prediction unit is configured to generate a first product interest score indicative of a first interest level in the product by the selected user, a second product interest score indicative of a second interest level in the product by the selected user, a community interest score associated with a community interest in the one or more selected products, a user feature score associated with one or more primary user features of the selected product, and a product feature score indicative of one or more primary features of the selected product, and to determine therefrom a final product score indicative of the user interest in the one or more selected products, and
a ranking unit for ranking the final product interest scores.