CPC G06F 40/40 (2020.01) [G06F 40/279 (2020.01); G06V 30/19093 (2022.01); G06V 30/19107 (2022.01)] | 20 Claims |
1. A short text similarity based clustering system comprising:
one or more processors;
one or more memory components communicatively coupled to the one or more processors; and
machine-readable instructions stored in the one or more memory components that cause the short text similarity based clustering system to perform at least the following when executed by the one or more processors:
receive a plurality of documents comprising short text data;
determine a plurality of forward similarity values based on the short text data in each of the plurality of documents, wherein a forward similarity value of one indicates a subsequent document comprises at least the short text data of a preceding document;
determine a plurality of reverse similarity values based on the short text data in each of the plurality of documents, wherein a reverse similarity value of one indicates the preceding document comprises at least the short text data of the subsequent document;
generate a forward and reverse similarity matrix based on the plurality of forward similarity values and the plurality of reverse similarity values; and
generate a plurality of short text similarity based clusters to group the short text data of the plurality of documents based on the forward and reverse similarity matrix.
|