CPC G06F 16/243 (2019.01) [G06F 16/24539 (2019.01)] | 14 Claims |
1. A computer-implemented system comprising:
a computer server comprising one or more processors;
a memory component storing data tables; and
non-transitory memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to:
receive, from a user via a user interface, a user query in natural language format to retrieve the data tables;
in response to the receiving the user query:
extract, via an attribute extractor, attributes from the user query;
map, based on indexing and vectorization via a domain mapper, the attributes to a relevant domain and a relevant sub-domain included in a standardized metadata store, wherein the standardized metadata store is generated by a standard data model using contextual data domain mapping and common data domain mapping, wherein the standard data model is based on physical data model and data statistics from a plurality of firmwide metadata systems;
execute, via a retriever, a set of similarity searches limited to the relevant domain and the relevant sub-domain of the standardized metadata store to generate results comprising one or more tables and columns, wherein the set of similarity searches are performed by a semantic retriever, a hybrid retriever and a graph retriever executing in combination, wherein the executing the set of similarity searches includes running the similarity searches on indexes identified by the domain mapper to find the tables and columns matching the attributes in the user query, wherein the retriever further retrieves, from few shot queries, datasets including the tables and columns;
apply, via a re-ranker, a re-ranking model to the results of the set of similarity searches to generate a re-ranked result including re-ranked tables and columns;
generate, via a generator, a query statement using the re-ranked result embedded with an enhanced prompt by applying a set of prompt configurations, wherein the set of prompt configurations comprise two or more of: domain specific guidelines, query optimization guidelines, and output formatting; and
transmit, on the same prompt via a communication network, the query statement with corresponding explanation including constraints to the user interface.
|