CPC G06F 16/24568 (2019.01) [G06F 9/4881 (2013.01); G06F 16/244 (2019.01); G06F 16/288 (2019.01)] | 24 Claims |
1. A method for tracking data lineage, the method comprises:
detecting, with a data lineage recorder module, an execution of a task of a workflow by a workflow engine on a device, the workflow comprises a plurality of ordered tasks for execution on the workflow engine, wherein the data lineage recorder module captures events emitted from the workflow engine that do not pass through a data lineage proxy;
receiving, at the data lineage proxy, a request sent from the device to the data lineage proxy for forwarding to a recipient;
automatically identifying, with the data lineage proxy, an identity of the recipient of the request based on an IP address of the recipient for forwarding the request received at the data lineage proxy, a content of the request, or a content of a response associated with the request;
selecting, with the data lineage recorder module, request lineage data items associated with the task from the content of the request;
associating, in a data lineage database, the request lineage data items with the task currently executed by the workflow engine of the plurality of ordered tasks;
forwarding, from the data lineage proxy, the request to the recipient;
receiving, at the data lineage proxy, the response to the request from the recipient;
automatically identifying the task and the device associated with the response to the request;
selecting, with the data lineage recorder module, response lineage data items associated with the task from the content of the response to the request;
associating, in the data lineage database, the response lineage data items with the task currently executed by the workflow engine of the plurality of ordered tasks;
forwarding, from the data lineage proxy, the response to the device;
recording, with the data lineage recorder module and based on the events emitted by the workflow engine, a completion of the task and an output of the task to a subsequent task within the workflow; and
generating for display on a user interface device, a data lineage graph comprising representations of the plurality of ordered tasks of the workflow including the task, the request lineage data items associated with the task, the response lineage data items associated with the task, and input and output interconnections between the plurality of ordered tasks within the workflow.
|