US 11,741,142 B2
Systems and methods for abstractive document summarization with entity coverage control
Haopeng Zheng, San Francisco, CA (US); Semih Yavuz, Redwood City, CA (US); Wojciech Kryscinski, Palo Alto, CA (US); Kazuma Hashimoto, Menlo Park, CA (US); and Yingbo Zhou, Palo Alto, CA (US)
Assigned to salesforce.com, inc., San Francisco, CA (US)
Filed by salesforce.com, inc., San Francisco, CA (US)
Filed on Jan. 31, 2022, as Appl. No. 17/589,522.
Claims priority of provisional application 63/230,562, filed on Aug. 6, 2021.
Prior Publication US 2023/0054068 A1, Feb. 23, 2023
Int. Cl. G06F 16/34 (2019.01); G06F 40/166 (2020.01); G06N 20/00 (2019.01); G06F 40/117 (2020.01); G06F 40/279 (2020.01)
CPC G06F 16/345 (2019.01) [G06F 40/166 (2020.01); G06N 20/00 (2019.01); G06F 40/117 (2020.01); G06F 40/279 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method for abstractive summarization of a document, the method comprising:
receiving, via a data interface, a training dataset comprising a plurality of articles and a plurality of summaries corresponding to the plurality of articles;
generating a plurality of article-summary pairs by pairing each article with at least one associated summary;
computing, for an article-summary pair, an entity coverage precision metric based on a number of entity mentions in a corresponding summary or a corresponding article;
determining a pseudo label indicating a faithfulness level of the corresponding article and the corresponding summary based on the computed entity coverage precision metric;
prepending the article with the determined pseudo label as a training input to a summarization model;
generating, by the summarization model, an output summary conditioned on both the article and the prepended pseudo label; and
updating the summarization model based on a training objective comparing the output summary and the corresponding summary.