US 12,287,804 B2
Natural language-based data integration
Shaily Jignesh Fozdar, Boston, MA (US); David Joseph Donahue, Cambridge, MA (US); Fang Liu, Shanghai (CN); Noelle Yanhui Li, New York, NY (US); Abhishek Narain, Woodinville, WA (US); Irene Rogan Shaffer, Cambridge, MA (US); Wee Hyong Tok, Redmond, WA (US); Ehimwenma Nosakhare, Newton, MA (US); Vivek Gupta, Groton, MA (US); Gust Verbruggen, Keerbergen (BE); Vu Minh Le, Redmond, WA (US); Jordan Joseph Henkel, Madison, WI (US); Avrilia Floratou, Sunnyvale, CA (US); Joyce Yu Cahoon, Woodinville, WA (US); Richard Anarfi, Boston, MA (US); Jason Wang, Boston, MA (US); Daniel Muñoz Huerta, Cambridge, MA (US); and Yan Qiu, Shanghai (CN)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 31, 2023, as Appl. No. 18/241,028.
Prior Publication US 2025/0077538 A1, Mar. 6, 2025
Int. Cl. G06F 17/00 (2019.01); G06F 16/242 (2019.01); G06F 16/25 (2019.01)
CPC G06F 16/254 (2019.01) [G06F 16/243 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for performing natural language-based data integration, wherein the method is implemented via a service provider device comprising a processor, and wherein the method comprises:
causing execution of a data integration application on a remote device via a network;
causing surfacing of a graphical user interface (GUI) corresponding to the data integration application on a display of the remote device;
receiving, via the GUI, a natural language input representing a data integration task;
generating, via a large language model (LLM), a set of ordered activities corresponding to the data integration task represented by the natural language input;
selecting, via the LLM, at least one application programming interface (API) for performing each activity within the set of ordered activities;
generating a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and
back-translating the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application.