| CPC G06F 40/30 (2020.01) [G06F 16/367 (2019.01)] | 24 Claims |

|
1. A method for determining semantic relationships between hashtags in social media messages, the method comprising:
receiving, by one or more processors, social media data including a plurality of social media messages;
performing, by the one or more processors, natural language processing (NLP) on the social media data to extract a plurality of hashtags included in the plurality of social media messages;
determining, by the one or more processors and for each of at least two hashtags of the plurality of hashtags, one or more co-occurrence frequency counts for the hashtag and other hashtags of the plurality of hashtags, wherein a co-occurrence frequency count for any two hashtags represents a count of social media messages of the plurality of social media messages that include both the two hashtags;
detecting, by the one or more processors and for each of the at least two hashtags, at least one community of a plurality of communities based at least in part on the co-occurrence frequency counts of the at least two hashtags, wherein the at least one community comprises a topic semantically related to the hashtag;
determining, by the one or more processors and for the at least two hashtags, one or more diversity metrics that indicate a distribution of the at least two hashtags across the plurality of social media messages or with respect to other hashtags of the plurality of hashtags,
wherein the one or more diversity metrics include a community diversity metric that measures an amount of other hashtags from different communities of the plurality of communities with which a corresponding hashtag co-occurs in the same social media message,
wherein the community diversity metric corresponds to how many different communities are detected for each of the at least two hashtags;
determining, by the one or more processors and for each of the at least two hashtags, an ensemble score based on a combination of the one or more diversity metrics that correspond to the hashtag;
generating, by the one or more processors, a hierarchical ontology of hashtags based on the ensemble scores for the at least two hashtags and the one or more co-occurrence frequency counts for the at least two hashtags; and
storing, by the one or more processors, hashtag semantic data that represents the hierarchical ontology.
|