US 12,248,752 B2
Primary topic identification in text data
Nadiya Kochura, Bolton, MA (US); Fang Lu, Billerica, MA (US); Kelley Anders, East New Market, MD (US); and Jonathan D. Dunne, Dungarvan (IE)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 25, 2021, as Appl. No. 17/303,231.
Prior Publication US 2022/0382986 A1, Dec. 1, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 40/194 (2020.01); H04L 51/02 (2022.01)
CPC G06F 40/30 (2020.01) [G06F 40/194 (2020.01); H04L 51/02 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for directing a chat agent for an online discussion, the method comprising:
retrieving message data and collaborative data from a messaging program;
identifying a plurality of topics present in the message data using topic modeling;
determining a respective topic distance for each of the plurality of the topics present in the message data, wherein the respective topic distance for each of the plurality of the topics is a geodesic distance between two or more messages directed towards a respective topic and a cosine similarity of the two or more messages;
determining a collaborative distance between two or more participants discussing a respective topic for each of the plurality of the topics, wherein the collaborative distance is based on a role assigned to at least one participant in the message data and a frequency of communication between the two or more participants;
analyzing each of the plurality of the topics to determine one or more likelihood probabilities that indicate whether a respective topic is a primary topic or a secondary topic based on the collaborative distances and the respective topic distances for each of the plurality of the topics;
combining each of the plurality of the topic's respective one or more likelihood probabilities to calculate a combined likelihood for each of the plurality of the topics;
comparing each of the plurality of the topic's combined likelihood to a determined threshold value to determine if a respective topic is a primary topic or a secondary topic; and
modifying the message data to designate each of the plurality of the topics as a primary topic or a secondary topic.