US 12,437,016 B2
Fine-tuning large language model(s) using reinforcement learning with search engine feedback
Hyun Jin Park, Palo Alto, CA (US); and Changwan Ryu, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Dec. 7, 2023, as Appl. No. 18/532,140.
Prior Publication US 2025/0190506 A1, Jun. 12, 2025
Int. Cl. G06F 40/20 (2020.01); G06F 16/9538 (2019.01)
CPC G06F 16/9538 (2019.01) [G06F 40/20 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors, the method comprising:
identifying an instance of natural language (NL) based input;
processing the instance of NL based input using a large language model (LLM) to generate raw LLM output, where the raw LLM output is NL based output that is responsive to the NL based input;
generating an instance of search engine conditioned NL input based on processing the instance of NL based input using a search engine;
processing the instance of search engine conditioned NL input using the LLM to generate an instance of search engine conditioned output, where the instance of search engine conditioned output is NL based output;
generating a supervision signal based on the raw LLM output and the search engine conditioned output; and
fine-tuning the LLM based on the supervision signal.