US 12,147,757 B2
Unifying text segmentation and long document summarization
Sangwoo Cho, Sammamish, WA (US); Kaiqiang Song, Palo Alto, CA (US); Xiaoyang Wang, Palo Alto, CA (US); and Dong Yu, Palo Alto, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Dec. 28, 2022, as Appl. No. 18/090,132.
Prior Publication US 2024/0220709 A1, Jul. 4, 2024
Int. Cl. G06F 40/166 (2020.01); G06F 40/289 (2020.01); G06N 20/00 (2019.01)
CPC G06F 40/166 (2020.01) [G06F 40/289 (2020.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method executed by at least one processor, the method comprising:
receiving an input comprising natural language texts;
segmenting the natural language texts into a plurality of sections;
summarizing the natural language texts;
developing a first model based on the plurality of sections and the summary of the natural language texts;
identifying two or more salient sentences within the natural language texts using the first model;
determining a sentence quality score for each of the two or more salient sentences;
determining, for each of the two or more salient sentences, a sentence similarity score based on a similarity of the salient sentence to another salient sentence of the two or more salient sentences;
generating a second model, as a negative log-probability of a ground-truth extractive summary, based on performing batch matrix multiplication (BMM) between the sentence quality scores and the sentence similarity scores to calculate a matrix product;
combining the first model and the second model into a final model;
selecting sentences from the natural language texts based on the final model; and
generating an extractive summarization of the natural language texts using the selected sentences.