| CPC G10L 15/1815 (2013.01) [G06F 16/338 (2019.01); G06F 16/35 (2019.01); G10L 15/063 (2013.01); G10L 15/10 (2013.01); G10L 15/14 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01)] | 30 Claims |

|
1. A computer-implemented method executed using a computing device, the method comprising:
digitally generating and storing a machine learning statistical topic model in computer memory, the topic model being programmed to model call transcript data representing words spoken on a call as a function of one or more topics of a set of topics, the set of topics being modeled to comprise a set of pre-seeded topics and a set of non-pre-seeded topics, and the one or more topics being modeled as a function of a probability distribution of topics; programmatically pre-seeding the topic model with a set of keyword groups, each keyword group associating a respective set of keywords with a topic of the set of pre-seeded topics; programmatically training the topic model using unlabeled training data; conjoining a classifier to the topic model to create a classifier model, the classifier defining a joint probability distribution over topic vectors and one or more observed labels; programmatically training the classifier model using labeled training data; receiving target call transcript data comprising an electronic digital representation of a verbal transcription of a target call; programmatically determining, using the classifier model, at least one of one or more topics of the target call or one or more classifications of the target call; digitally storing the target call transcript data with additional data indicating the determined one or more topics of the target call and/or the determined one or more classifications of the target call;
accessing, in computer storage, a first digitally stored electronic document comprising a first text;
receiving computer input specifying a search query comprising one or more search terms;
processing the search query using the classifier model to output a query topic vector representing a thematic content of the search query;
processing the first text using the classifier model to output and store in the computer memory a first plurality of topic vectors each representing a topic in the text;
using the query topic vector and the first plurality of topic vectors, calculating a plurality of similarity values, each of the similarity values representing a similarity of the query topic vector to a particular topic vector among the first plurality of topic vectors;
outputting a visual display that specifies one or more topic vectors among the first plurality of topic vectors having one or more corresponding similarity values that are greater than a specified threshold similarity value.
|