| CPC G06F 40/284 (2020.01) [G06F 16/3329 (2019.01); G06F 16/35 (2019.01); G06F 40/30 (2020.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G10L 15/26 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving, by one or more processors, an input data object comprising textual data of a conversation, wherein the textual data comprises a plurality of sentence-level tokens;
generating, by the one or more processors, an interrogative classification for the plurality of sentence-level tokens based at least in part on one or more word-level tokens that respectively correspond to the plurality of sentence-level tokens, wherein generating the interrogative classification comprises indicating, from the plurality of sentence-level tokens, a first interrogative sentence-level token and a second interrogative sentence-level token;
identifying, by the one or more processors, a subtopic portion of the textual data based at least in part on a first location within the textual data and a second location within the textual data, wherein (i) the first location corresponds to the first interrogative sentence-level token, (ii) the second location occurs in the textual data before the second interrogative sentence-level token, and (iii) the subtopic portion comprises a portion of the plurality of sentence-level tokens and the first interrogative sentence-level token;
selecting, by the one or more processors, a sentence-level token from the portion of the plurality of sentence-level tokens in the subtopic portion, wherein the selecting is based at least in part on (i) a determination that a similarity score for the first interrogative sentence-level token and a target query of a plurality of target queries satisfies a threshold similarity score, and (ii) an aggregate characterization score based at least in part on (a) an informativeness score indicative of an informational value of the sentence-level token, and (b) a readability score indicative of a linguistic quality of the sentence-level token, wherein the readability score is based at least in part on one or more probabilities output by a language model;
generating, by the one or more processors, a summarization data object comprising the sentence-level token; and
initiating, by the one or more processors, a performance of one or more summarization-based actions based at least in part on the summarization data object.
|