US 12,135,740 B1
Generating a unified metadata graph via a retrieval-augmented generation (RAG) framework systems and methods
Linfeng Yu, New York, NY (US); Vaibhav Kumar, New York, NY (US); and Ashutosh Pandey, New York, NY (US)
Assigned to CITIBANK, N.A., New York, NY (US)
Filed by Citibank, N.A., New York, NY (US)
Filed on Apr. 4, 2024, as Appl. No. 18/627,332.
Application 18/627,332 is a continuation in part of application No. 18/390,916, filed on Dec. 20, 2023, granted, now 11,971,891.
Int. Cl. G06F 17/00 (2019.01); G06F 16/332 (2019.01); G06F 16/383 (2019.01)
CPC G06F 16/383 (2019.01) [G06F 16/3329 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system for reducing data retrieval times when accessing siloed data across disparate locations by generating a unified metadata graph via a Retrieval-Augmented Generation (RAG) framework, the system comprising:
at least one hardware processor; and
at least one non-transitory memory storing instructions, which, when executed by the at least one hardware processor, cause the system to:
receive, from a set of data silos, raw data comprising a set of metadata identifiers indicating (i) file-level metadata identifiers, (ii) container-level metadata identifiers, and (iii) system-level metadata identifiers;
select, from a set of structured Large Language Model (LLM) prompts, a first structured LLM prompt corresponding to a first metadata identifier of the set of metadata identifiers;
augment the first structured LLM prompt with the first metadata identifier to be provided to an LLM communicatively coupled to a set of domain-specific ontologies, wherein the LLM is configured to generate a first intermediate output indicating a second set of metadata identifiers corresponding to the first metadata identifier without accessing the set of domain-specific ontologies;
augment the first structured LLM prompt with the second set of metadata identifiers corresponding to the first metadata identifier to be provided to the LLM, wherein the LLM is configured to generate a second intermediate output indicating a filtered domain-specific metadata identifier by accessing the set of domain-specific ontologies;
generate a domain-specific unified metadata graph, via the LLM, using (i) the first metadata identifier and (ii) the second intermediate output indicating the filtered domain-specific metadata identifier, wherein the filtered domain-specific metadata identifier is a traversable identifier and the first metadata identifier is a non-traversable identifier within the domain-specific unified metadata graph;
perform a validation process on the domain-specific unified metadata graph by comparing first performance metrics of the domain-specific unified metadata graph to second performance metrics of another version of the domain-specific unified metadata graph; and
in response to determining that the first performance metrics fail to meet or exceed the second performance metrics of the other version of the domain-specific unified metadata graph, perform an update process on the domain-specific unified metadata graph.