US 11,868,381 B2
Systems and methods for training language models to reason over tables
Thomas Müller, Dietikon (CH); Jonathan Herzig, Tel Aviv (IL); Pawel Nowak, Zurich (CH); Julian Eisenschlos, Zurich (CH); Francesco Piccinno, Zurich (CH); and Syrine Krichene, Zurich (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 29, 2021, as Appl. No. 17/215,465.
Prior Publication US 2022/0309087 A1, Sep. 29, 2022
Int. Cl. G06F 16/332 (2019.01); G06N 3/08 (2023.01); G06F 40/20 (2020.01); G06F 40/284 (2020.01); G06F 40/35 (2020.01)
CPC G06F 16/3329 (2019.01) [G06F 40/20 (2020.01); G06F 40/284 (2020.01); G06F 40/35 (2020.01); G06N 3/08 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method of training a language model, comprising:
pre-training the language model, using one or more processors of a processing system, based on a plurality of pre-training examples comprising a plurality of counterfactual examples each comprising a respective table, a respective first statement, and a respective second statement; and
fine-tuning the language model, using the one or more processors, based on a plurality of fine-tuning examples each comprising a respective question, a respective answer, and a corresponding table;
wherein, for a first fine-tuning example of the plurality of fine-tuning examples having the respective answer being a scalar, the fine-tuning comprises:
generating an estimated answer to the respective question based on:
the corresponding table;
the language model's predictions of whether the respective answer is based on each cell of a plurality of cells of the corresponding table; and
the language model's predictions of whether the respective answer is based on each aggregation operation of a plurality of aggregation operations;
generating a first loss value based on the estimated answer;
generating a second loss value based on the language model's predictions of whether the respective answer is based on each aggregation operation of the plurality of aggregation operations; and
modifying one or more parameters of the language model based at least on the first and second loss values.