US 12,450,799 B1
Multi-language program and data flow analysis using LLM
Kallol Duttagupta, Basking Ridge, NJ (US); Kumar Vadaparty, Belle Mead, NJ (US); Thomas Mathew, Parsippany, NJ (US); and Vivek S. Agrawal, Princeton, NJ (US)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Mar. 14, 2025, as Appl. No. 19/080,275.
Int. Cl. G06T 11/20 (2006.01); G06F 8/41 (2018.01)
CPC G06T 11/206 (2013.01) [G06F 8/433 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A computer-implemented method for analyzing program and data flows in a software system of an enterprise, wherein the software system comprises code written in multiple programming languages, the method comprising:
receiving, by a backend computer system, source code from a code repository, wherein the source code is for the software system and includes code written in at least two different programming languages;
selecting, by a user via a front-end interface, programming-language-specific prompts from a prompt library, wherein the prompt library comprises a plurality of predefined prompts tailored to different programming languages;
processing, by a generative large language model (LLM) executed on the backend computer system, the received source code using the selected prompts to generate a plurality of labeled graph nodes, wherein each labeled graph node represents a functional component of the software system and comprises:
a node type;
a node name; and
for each functional component of the software system dependent on one or more other labeled graph nodes, dependency information for the node corresponding to the functional component, wherein the dependency information identifies the one or more other labeled graph nodes on which the functional component depends;
generating, by a graph construction computer system, a directed graph based on the plurality of labeled graph nodes, wherein:
each labeled graph node is represented as a node in the directed graph, and
directed edges between the nodes are established based on the dependency information; and
providing, via the front-end interface, a visual representation of the directed graph to the user, thereby enabling analysis of program and data flows across the software system independent of the programming languages in which the code of the software system is written.