US 11,941,038 B2
Transparent and controllable topic modeling
Raghu Kiran Ganti, White Plains, NY (US); Mudhakar Srivatsa, White Plains, NY (US); Shreeranjani Srirangamsridharan, San Jose, CA (US); Jae-Wook Ahn, Nanuet, NY (US); Michele Merler, New York City, NY (US); and Dean Steuer, White Plains, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on May 19, 2022, as Appl. No. 17/748,263.
Prior Publication US 2023/0376518 A1, Nov. 23, 2023
Int. Cl. G06F 16/35 (2019.01); G06F 40/30 (2020.01); G06F 40/40 (2020.01)
CPC G06F 16/358 (2019.01) [G06F 40/30 (2020.01); G06F 40/40 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for controlling and visualizing topic modeling results, the computer-implemented method comprising:
inputting, by a processor, a dataset into a hierarchical topic modeling algorithm configured for hierarchical clustering analysis and natural language processing (NLP) of the dataset;
generating, by the processor, a set of clusters based on a first set of parameters inputted into the hierarchical modeling algorithm, wherein each cluster represents a topic identified from the dataset;
outputting, by the processor, an interactive two-dimensional (2D) spatial distribution of the set of clusters to a user interface, wherein the interactive 2D spatial distribution is obtained through a multidimensional scaling of semantic embeddings, and nodes of the interactive 2D spatial distribution each represent a cluster of the set of clusters and distance between the nodes depicts a level of similarity between topics represented by the nodes;
selecting, by the processor, a first node of the interactive 2D spatial distribution being displayed by the user interface; and
in response to selecting the first node of the interactive 2D spatial distribution, visually generating, by the processor, an individual topic view of the first node based on refining the hierarchical topic modeling via an iterative interaction feedback loop, wherein the individual topic view comprising a semantic summary explaining topic definitions for the first node and structural attributes explaining how the topic of the first node differs from remaining nodes of the 2D spatial distribution.