US 12,013,838 B2
System and method for automated data integration
Raj Nigam, Haryana (IN); and Manas Sehra, Noida (IN)
Assigned to Fractal Analytics Private Limited, Oberoi Garden (IN)
Filed by Fractal Analytics Private Limited, Oberoi Garden (IN)
Filed on Aug. 12, 2022, as Appl. No. 17/887,097.
Claims priority of application No. 202221037830 (IN), filed on Jun. 30, 2022.
Prior Publication US 2024/0004863 A1, Jan. 4, 2024
Int. Cl. G06F 16/23 (2019.01); G06F 16/25 (2019.01); G06Q 10/0631 (2023.01)
CPC G06F 16/2365 (2019.01) [G06F 16/254 (2019.01); G06Q 10/0631 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A platform for automated data integration implemented on one or more hardware computer processors and one or more storage devices, the platform comprising:
one or more input databases, wherein the one or more input databases are configured to upload data from a plurality of external data sources, the external data sources including real-time systems, near-real-time systems, and batch-oriented applications, wherein the external data sources have different formats;
a data integration system, wherein the data integration system comprises a microservices-based web application server comprising a harmonization component, a master enrichment management component to ensure data quality, and a validation component, a data processing cluster that accesses large-scale data to extract information for supporting and providing business decisions, a metadata store and a message broker; and
an output database configured to store integrated data for business intelligence analysis;
wherein the data integration system is configured to execute code that causes the system to:
receive an inbound message by the message broker;
transmit data from the one or more input databases;
pre-process data, wherein the pre-processing step comprises one or more of removing empty records, removing duplicate records, removing erroneous records, and filling missing records based on an average of historical data, wherein natural language processing (NLP) is applied to process unstructured data for noise removal and text normalization;
extract features from the pre-processed data by enabling a pattern recognition algorithm following a metadata-based mapping logic over the plurality of data types, wherein the metadata store is configured to store data type or feature definitions that are used to access data within the one or more of the input databases;
transform the plurality of data formats to a standardized data format that is selected by a user, wherein the standardized data format is compatible for combining and/or analysis;
enrich the pre-processed data by enabling supervised learning to determine a decision boundary to separate different classes of the data;
align the pre-processed data by enabling supervised learning to unify or de-duplicate records that have similar naming convention for a given feature and standardize text;
enhance accuracy through a user feedback loop by learning from previous prediction and reducing errors in subsequent interactions;
generate master data of a plurality of business entities; and
publish the transformed data to the output database.