US 12,411,717 B1
System and method for observing and predicting data batch activity in real time
Vishal Deshmukh, Plainsboro, NJ (US); Sujit Eapen, Plainsboro, NJ (US); Himanshu Rout, Bengaluru (IN); Rashad Barron, Stroudsburg, PA (US); Rachit Mehrotra, Princeton, NJ (US); Tanmay Nagar, Mumbai (IN); and Tejas Shah, Bengaluru (IN)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Nov. 19, 2024, as Appl. No. 18/951,845.
Application 18/951,845 is a continuation of application No. 18/739,658, filed on Jun. 11, 2024, granted, now 12,190,166.
Int. Cl. G06F 9/50 (2006.01)
CPC G06F 9/505 (2013.01) 8 Claims
OG exemplary drawing
 
1. A computer-implemented method for monitoring data batch activity and providing event class failure recovery in real time comprising:
obtaining i) ordered lists of jobs of a batch from one or more scheduling platforms including and ii) information related to file transfers from one or more file transfer sources, the ordered lists and information related to file transfers being updated in real time;
extracting data from the ordered and the information from the one or more file transfer sources;
enriching the extracted data using additional information retrieved from at least one meta data repository;
generating a dependency graph that includes real time job and file transfer data obtained from the enriched data in which nodes of the graph represent events and edges represent relationships between the nodes indicating a dependence, wherein if one event is dependent on the execution of a prior event, the event is considered dependent on the prior event;
obtaining critical milestones from the dependency graph;
generating critical paths for traversing the dependency graph for job and file transfer execution using the milestones;
receiving notification of a specific failed job or file transfer;
determining a failure code for the specific failed job or transfer;
determining a position of the specific failed job or transfer in the dependency graph; and
obtaining historical data concerning failure modes of the jobs and file transfers; and
determining whether recovery of an event class corresponding to the specific failed job or file transfer can be automated based on the failure code, position of the specific failed job or transfer in the dependency graph, and historical data corresponding to the failure code;
when it is determined that recovery of the event class corresponding to the specific failed job can be automated based on the failure code, predicting a time window for automatically restarting the event class; and
scheduling and executing a restart of the event class during the predicted time window.