CPC G06F 16/285 (2019.01) [G06N 20/00 (2019.01)] | 13 Claims |
1. A system comprising:
a computing device configured to:
obtain a plurality of data items over a threshold analysis period from an incoming database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter;
select a categorization model from a model database based on the at least one parameter of the plurality of data items;
for each data item of the plurality of data items, apply the categorization model to the data item and identify at least one topic associated with the corresponding data item, by:
comparing the data item to a set of known topics;
determining a similarity based on a distance value between each known topic of the set of known topics and the data item;
categorizing the data item as a corresponding known topic of the set of known topics when the data item is within a threshold distance of the corresponding known topic; and
identifying the data item as an unknown data item when the data item is outside the threshold distance of each known topic of the set of known topics;
for at least one unknown data item in the plurality of data items:
access, via a distributed communications network, public resources;
compare the at least one unknown data item to public data of the public resources;
generate a new topic, outside the set of known topics, for the at least one unknown data item, wherein the new topic has a title determined based on the public data; and
categorize the at least one unknown data item as the new topic;
generate a visualization indicating a frequency of topics based on data items corresponding to each topic, the visualization including the new topic with an indication that the at least one unknown data item was categorized into an unknown topic category and public resources were used to determine the title of the new topic; and
transmit the visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database.
|