CPC G06F 40/40 (2020.01) [G06F 40/284 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06F 40/30 (2020.01)] | 14 Claims |
1. A method for generating a summary of a nonfiction article, comprising:
receiving a portion of the nonfiction article, wherein the portion comprises a plurality of text segments and wherein the portion is associated with citation information including: (i) an inbound citation that post-dates the nonfiction article, (ii) an outbound citation that pre-dates the nonfiction article, or (iii) any combination thereof;
inputting the portion of the nonfiction article and the citation information associated with the portion to a natural language processing (NLP) model;
for a text segment of the plurality of text segments in the portion:
(i) determining, by the NLP model, that a first part of the text segment is background information by detecting an outbound citation associated with the text segment;
(ii) determining, by the NLP model, that a second part of the text segment is a new contribution attributable to the nonfiction article by detecting an inbound citation associated with the text segment, or
(iii) any combination thereof;
computing a loss objective based at least in part on: (i) a first conditional probability distribution of the background information conditioned on given information of the outbound citation; (ii) a second conditional probability distribution of the new contribution conditioned on given information of the inbound citation; or (iii) any combination thereof, wherein the loss objective is computed by a cross-entropy loss of the NLP model minus a weighted version of the informativeness term that is based on the first conditional probability distribution or the second conditional probability distribution depending on a mode of the nonfiction article; and
updating the NLP model using the loss objective via backpropagation.
|