US 11,790,184 B2
Systems and methods for scientific contribution summarization
Hiroaki Hayashi, Pittsburgh, PA (US); and Wojciech Kryscinski, San Francisco, CA (US)
Assigned to SALESFORCE.COM, INC., San Francisco, CA (US)
Filed by salesforce.com, inc., San Francisco, CA (US)
Filed on Jan. 28, 2021, as Appl. No. 17/161,327.
Claims priority of provisional application 63/071,673, filed on Aug. 28, 2020.
Prior Publication US 2022/0067302 A1, Mar. 3, 2022
Int. Cl. G06F 40/40 (2020.01); G06F 40/284 (2020.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 40/30 (2020.01)
CPC G06F 40/40 (2020.01) [G06F 40/284 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06F 40/30 (2020.01)] 14 Claims
OG exemplary drawing
 
1. A method for generating a summary of a nonfiction article, comprising:
receiving a portion of the nonfiction article, wherein the portion comprises a plurality of text segments and wherein the portion is associated with citation information including: (i) an inbound citation that post-dates the nonfiction article, (ii) an outbound citation that pre-dates the nonfiction article, or (iii) any combination thereof;
inputting the portion of the nonfiction article and the citation information associated with the portion to a natural language processing (NLP) model;
for a text segment of the plurality of text segments in the portion:
(i) determining, by the NLP model, that a first part of the text segment is background information by detecting an outbound citation associated with the text segment;
(ii) determining, by the NLP model, that a second part of the text segment is a new contribution attributable to the nonfiction article by detecting an inbound citation associated with the text segment, or
(iii) any combination thereof;
computing a loss objective based at least in part on: (i) a first conditional probability distribution of the background information conditioned on given information of the outbound citation; (ii) a second conditional probability distribution of the new contribution conditioned on given information of the inbound citation; or (iii) any combination thereof, wherein the loss objective is computed by a cross-entropy loss of the NLP model minus a weighted version of the informativeness term that is based on the first conditional probability distribution or the second conditional probability distribution depending on a mode of the nonfiction article; and
updating the NLP model using the loss objective via backpropagation.