US 12,406,139 B2
	Query-focused extractive text summarization of textual data
Suman Roy, Bangalore (IN); Vijay Varma Malladi, Hyderabad (IN); and Gaurav Ranjan, Bangalore (IN)
Assigned to Optum, Inc., Minnetonka, MN (US)
Filed by Optum, Inc., Minnetonka, MN (US)
Filed on Aug. 18, 2021, as Appl. No. 17/405,555.
Prior Publication US 2023/0054726 A1, Feb. 23, 2023
Int. Cl. G06F 40/284 (2020.01); G06F 16/3329 (2025.01); G06F 16/35 (2025.01); G06F 40/30 (2020.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); G10L 15/26 (2006.01)

CPC G06F 40/284 (2020.01) [G06F 16/3329 (2019.01); G06F 16/35 (2019.01); G06F 40/30 (2020.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G10L 15/26 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving, by one or more processors, an input data object comprising textual data of a conversation, wherein the textual data comprises a plurality of sentence-level tokens;

generating, by the one or more processors, an interrogative classification for the plurality of sentence-level tokens based at least in part on one or more word-level tokens that respectively correspond to the plurality of sentence-level tokens, wherein generating the interrogative classification comprises indicating, from the plurality of sentence-level tokens, a first interrogative sentence-level token and a second interrogative sentence-level token;

identifying, by the one or more processors, a subtopic portion of the textual data based at least in part on a first location within the textual data and a second location within the textual data, wherein (i) the first location corresponds to the first interrogative sentence-level token, (ii) the second location occurs in the textual data before the second interrogative sentence-level token, and (iii) the subtopic portion comprises a portion of the plurality of sentence-level tokens and the first interrogative sentence-level token;

selecting, by the one or more processors, a sentence-level token from the portion of the plurality of sentence-level tokens in the subtopic portion, wherein the selecting is based at least in part on (i) a determination that a similarity score for the first interrogative sentence-level token and a target query of a plurality of target queries satisfies a threshold similarity score, and (ii) an aggregate characterization score based at least in part on (a) an informativeness score indicative of an informational value of the sentence-level token, and (b) a readability score indicative of a linguistic quality of the sentence-level token, wherein the readability score is based at least in part on one or more probabilities output by a language model;

generating, by the one or more processors, a summarization data object comprising the sentence-level token; and

initiating, by the one or more processors, a performance of one or more summarization-based actions based at least in part on the summarization data object.