US 12,322,382 B2
Topic-based semantic search of electronic documents based on machine learning models from Bayesian Belief networks
Michael Kingsley McCourt, Jr., Santa Barbara, CA (US); Kian Ghodoussi, Los Angeles, CA (US); and Victor Borda, Santa Barbara, CA (US)
Assigned to Invoca, Inc., Santa Barbara, CA (US)
Filed by INVOCA, INC., Santa Barbara, CA (US)
Filed on Mar. 17, 2023, as Appl. No. 18/185,496.
Prior Publication US 2024/0312451 A1, Sep. 19, 2024
Int. Cl. G10L 15/22 (2006.01); G06F 16/338 (2019.01); G06F 16/35 (2019.01); G10L 15/06 (2013.01); G10L 15/10 (2006.01); G10L 15/14 (2006.01); G10L 15/18 (2013.01); G10L 15/197 (2013.01)
CPC G10L 15/1815 (2013.01) [G06F 16/338 (2019.01); G06F 16/35 (2019.01); G10L 15/063 (2013.01); G10L 15/10 (2013.01); G10L 15/14 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A computer-implemented method executed using a computing device, the method comprising:
digitally generating and storing a machine learning statistical topic model in computer memory, the topic model being programmed to model call transcript data representing words spoken on a call as a function of one or more topics of a set of topics, the set of topics being modeled to comprise a set of pre-seeded topics and a set of non-pre-seeded topics, and the one or more topics being modeled as a function of a probability distribution of topics; programmatically pre-seeding the topic model with a set of keyword groups, each keyword group associating a respective set of keywords with a topic of the set of pre-seeded topics; programmatically training the topic model using unlabeled training data; conjoining a classifier to the topic model to create a classifier model, the classifier defining a joint probability distribution over topic vectors and one or more observed labels; programmatically training the classifier model using labeled training data; receiving target call transcript data comprising an electronic digital representation of a verbal transcription of a target call; programmatically determining, using the classifier model, at least one of one or more topics of the target call or one or more classifications of the target call; digitally storing the target call transcript data with additional data indicating the determined one or more topics of the target call and/or the determined one or more classifications of the target call;
accessing, in computer storage, a first digitally stored electronic document comprising a first text;
receiving computer input specifying a search query comprising one or more search terms;
processing the search query using the classifier model to output a query topic vector representing a thematic content of the search query;
processing the first text using the classifier model to output and store in the computer memory a first plurality of topic vectors each representing a topic in the text;
using the query topic vector and the first plurality of topic vectors, calculating a plurality of similarity values, each of the similarity values representing a similarity of the query topic vector to a particular topic vector among the first plurality of topic vectors;
outputting a visual display that specifies one or more topic vectors among the first plurality of topic vectors having one or more corresponding similarity values that are greater than a specified threshold similarity value.