US 12,321,343 B1
Natural language to SQL on custom enterprise data warehouse powered by generative artificial intelligence
Sanjit Vijay Mehta, Bengaluru (IN); Ashish Singh, Bengaluru (IN); Mayank Jain, Bengaluru (IN); Meet Singh, Bengaluru (IN); Satya Verma, Bengaluru (IN); Abhijit Anant Naik, Mumbai (IN); Mehak Mehta, Jersey City, NY (US); Vijay Kumar Butte, Princeton Junction, NJ (US); Sourabh Kumar Janghel, Bengaluru (IN); and Aditya Ramesh, Bengaluru (IN)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Feb. 6, 2025, as Appl. No. 19/046,934.
Int. Cl. G06F 16/242 (2019.01); G06F 16/2453 (2019.01)
CPC G06F 16/243 (2019.01) [G06F 16/24539 (2019.01)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented system comprising:
a computer server comprising one or more processors;
a memory component storing data tables; and
non-transitory memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to:
receive, from a user via a user interface, a user query in natural language format to retrieve the data tables;
in response to the receiving the user query:
extract, via an attribute extractor, attributes from the user query;
map, based on indexing and vectorization via a domain mapper, the attributes to a relevant domain and a relevant sub-domain included in a standardized metadata store, wherein the standardized metadata store is generated by a standard data model using contextual data domain mapping and common data domain mapping, wherein the standard data model is based on physical data model and data statistics from a plurality of firmwide metadata systems;
execute, via a retriever, a set of similarity searches limited to the relevant domain and the relevant sub-domain of the standardized metadata store to generate results comprising one or more tables and columns, wherein the set of similarity searches are performed by a semantic retriever, a hybrid retriever and a graph retriever executing in combination, wherein the executing the set of similarity searches includes running the similarity searches on indexes identified by the domain mapper to find the tables and columns matching the attributes in the user query, wherein the retriever further retrieves, from few shot queries, datasets including the tables and columns;
apply, via a re-ranker, a re-ranking model to the results of the set of similarity searches to generate a re-ranked result including re-ranked tables and columns;
generate, via a generator, a query statement using the re-ranked result embedded with an enhanced prompt by applying a set of prompt configurations, wherein the set of prompt configurations comprise two or more of: domain specific guidelines, query optimization guidelines, and output formatting; and
transmit, on the same prompt via a communication network, the query statement with corresponding explanation including constraints to the user interface.