US 12,393,584 B2
Generating training data for a machine learning model that performs text-to-SQL
Octavian Popescu, Westchester, NY (US); Hangu Yeo, Westchester, NY (US); Vadim Sheinin, Yorktown Heights, NY (US); Irene Lizeth Manotas Gutiérrez, White Plains, NY (US); Ngoc Phuoc An Vo, Bronx, NY (US); and Elahe Khorasani, Yorktown Heights, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Aug. 31, 2023, as Appl. No. 18/240,835.
Prior Publication US 2025/0077517 A1, Mar. 6, 2025
Int. Cl. G06F 16/24 (2019.01); G06F 16/2453 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/24544 (2019.01) [G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
generating a first database from a second database by replacing a plurality of field names in the first database with significant names;
selecting a set of values from a plurality of fields of a table in the first database;
selecting at least one adverb or adjective for the set of values;
determining join paths for values in the set of values;
determining, using a processor, a structured query language (SQL) pattern based, at least in part, on at least one value in the set of values, the at least one adverb or adjective for the set of values, and the join paths for the set of values; and
storing the structured query language pattern to first training data configured, at least in part, for use in machine learning to train a text-to-SQL model, the text-to-SQL model comprising a first artificial neural network and configured to convert first natural language text to a first structured query language query;
selecting a second structured query language query;
identifying nouns in the second structured query language query;
inferring at least one verbal phrase for the nouns in the second structured query language query;
creating at least one natural language sentence from the at least one verbal phrase for the nouns in the second structured query language query;
creating at least one user sentence pattern from the at least one natural language sentence; and
storing the at least one user sentence pattern to the first training data,
wherein the first training data further is configured for use in machine learning to train a SQL-to-text model, the SQL-to-text model comprising a second artificial neural network and configured to convert the first structured query language query to second natural language text.