US 12,271,374 B1
Querying heterogeneous data sources using machine learning based language model
Kaycee Kuan-Cheng Lai, San Carlos, CA (US); Aleksey Vinokurov, Yardley, PA (US); and Ravikanth Kasamsetty, Union City, CA (US)
Assigned to Promethium, Inc., Menlo Park, CA (US)
Filed by Promethium, Inc., Menlo Park, CA (US)
Filed on Apr. 23, 2024, as Appl. No. 18/644,083.
Int. Cl. G06F 16/2452 (2019.01); G06F 16/215 (2019.01); G06F 16/242 (2019.01); G06F 16/2453 (2019.01)
CPC G06F 16/24522 (2019.01) [G06F 16/215 (2019.01); G06F 16/242 (2019.01); G06F 16/24539 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for querying heterogeneous data sources using machine learning based language model, the computer-implemented method comprising:
storing metadata describing a plurality of data assets, wherein each of the plurality of data assets is stored in a data source of a plurality of data sources;
receiving, from a client device, a natural language question;
generating a prompt requesting a database query using syntax of a database query language, the database query corresponding to the natural language question;
sending the prompt to a machine learning based language model;
receiving a database query using syntax of a database query language generated by the machine learning based language model, the database query including one or more generated data asset names;
for each of the one or more generated data asset names, determining a data asset corresponding to the generated data asset name based on metadata describing the data asset;
modifying the database query by replacing each of the one or more generated data asset names by a name of the data asset corresponding to the generated data asset name; and
sending the modified database query for execution.