| CPC G06F 11/3476 (2013.01) [G06F 11/0772 (2013.01); G06F 11/323 (2013.01); G06F 11/3409 (2013.01); G06N 20/00 (2019.01)] | 17 Claims |

|
1. A computer-implemented method comprising:
obtaining, by an application server, raw distributed trace data for a large-scale distributed system from a plurality of distributed tracing clients in the large-scale distributed system;
aggregating, by the application server, the raw distributed trace data into aggregated distributed trace data;
pre-processing, by the application server, the aggregated distributed trace data to repair at least one trace that is incomplete, broken or incorrect using an infrastructure design for the large-scale distributed system, the infrastructure design comprising a dependency graph indicating dependencies among a plurality of devices and services in the large-scale distributed system independent of the raw distributed trace data;
generating, by the application server, a plurality of process flow graphs from the pre-processed aggregated distributed trace data;
storing, by the application server, the plurality of process flow graphs in graph-based storage in communication with the application server;
processing a graph query using the graph-based storage to determine a first critical path from the plurality of process flow graphs based on the infrastructure design for the large-scale distributed system including the dependency graph indicating dependencies among the plurality of devices and services in the large-scale distributed system; and
providing a process flow graph corresponding to the first critical path for graphical display.
|