US 12,248,768 B2
System and method for dynamic lineage tracking, reconstruction, and lifecycle management
Ganesh Seetharaman, Redwood Shores, CA (US); Alexander Sasha Stojanovic, Los Gatos, CA (US); Hassan Heidari Namarvar, Mountain View, CA (US); and David Allan, Novato, CA (US)
Assigned to ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed by ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed on May 6, 2022, as Appl. No. 17/738,774.
Application 17/738,774 is a continuation of application No. 15/683,567, filed on Aug. 22, 2017, granted, now 11,347,482.
Claims priority of provisional application 62/378,143, filed on Aug. 22, 2016.
Claims priority of provisional application 62/378,146, filed on Aug. 22, 2016.
Claims priority of provisional application 62/378,150, filed on Aug. 22, 2016.
Claims priority of provisional application 62/378,152, filed on Aug. 22, 2016.
Claims priority of provisional application 62/378,147, filed on Aug. 22, 2016.
Claims priority of provisional application 62/378,151, filed on Aug. 22, 2016.
Prior Publication US 2022/0269491 A1, Aug. 25, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 3/042 (2006.01); G06F 3/0482 (2013.01); G06F 3/06 (2006.01); G06F 8/10 (2018.01); G06F 8/34 (2018.01); G06F 8/41 (2018.01); G06F 16/14 (2019.01); G06F 16/21 (2019.01); G06F 16/23 (2019.01); G06F 16/25 (2019.01); G06F 16/435 (2019.01); G06F 17/18 (2006.01); G06F 40/30 (2020.01); G06N 5/022 (2023.01); G06N 5/04 (2023.01); G06N 5/046 (2023.01); G06N 20/00 (2019.01); G06Q 10/0637 (2023.01); G06F 9/50 (2006.01)
CPC G06F 8/433 (2013.01) [G06F 3/0428 (2013.01); G06F 3/0482 (2013.01); G06F 3/0649 (2013.01); G06F 8/10 (2013.01); G06F 8/34 (2013.01); G06F 8/4452 (2013.01); G06F 16/144 (2019.01); G06F 16/211 (2019.01); G06F 16/2322 (2019.01); G06F 16/2358 (2019.01); G06F 16/254 (2019.01); G06F 16/435 (2019.01); G06F 17/18 (2013.01); G06F 40/30 (2020.01); G06N 5/022 (2013.01); G06N 5/04 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01); G06Q 10/0637 (2013.01); G06F 9/5061 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for use with a data integration or other computing environment comprising:
providing an event coordinator that operates between one or more design-time and run-time systems, to coordinate events related to design, creation, monitoring, and management of a dataflow application, including events that define state changes associated with modification of the dataflow application;
receiving, via the event coordinator, a notification of data received from one or more data sources, and state transactions associated with the data, wherein the data received from the one or more data sources is associated with a source dataset or entity, and is mapped to a target dataset or entity;
receiving, from a knowledge source that stores profile information and other metadata associated with the one or more data sources comprising datasets, a metadata associated with processing a data flow associated with the one or more data sources, wherein the metadata provides a description of the datasets and their attributes and relationships;
ingesting data from the one or more data sources, and writing ingested data to a data repository operating as a data lake, for use by an input/output layer that provides access to the data structured as topics for use by one or more data flow applications; and
as the data is received from the one or more data sources for use by the data flow applications:
identifying portions of the ingested data corresponding to temporal slices of data, including:
accessing the knowledge source to obtain metadata associated with the ingested data represented by the temporal slices, including a description of the datasets and their attributes and relationships;
receiving from the knowledge source the metadata associated with the portions of the ingested data that indicates, for each portion of the ingested data, the data sources, datasets, and entities providing or operating on that portion of the ingested data; and
writing the portions of the ingested data to the data lake, and associating with each portion a lineage tracking information descriptive of the data sources, datasets, and entities providing or operating on that portion of the ingested data, as provided by the knowledge source,
for use by the one or more data flow applications, including as the portions of ingested data are further processed by the dataflow application, creating additional temporal slices and updating the lineage tracking information associated therewith to reflect the processing of that data.