| CPC G06F 16/21 (2019.01) | 18 Claims | 

| 
               1. A system comprising: 
            one or more processors; and 
                a memory storing code stored therein for determining a relatedness of content items to categories, wherein when executed the code causes the one or more processors to perform operations comprising: 
              pre-processing content to obtain information in the content, classify the content as pertaining to one or more categories based on the information, and organizing the content in terms of relevancy to categories; 
                  identifying a particular content item of the pre-processed content, a relevancy score associated with the particular content item, and a set of categories to which the particular content item is classified as related; 
                  generating glossaries associated with the set of categories, wherein generating each glossary of the glossaries comprises: 
                  using a glossary manager: 
                  directing a word stemming module to stem words in a collection of content to reduce words to a base form; 
                      receiving the collection of content with the words stemmed; 
                      identifying business content that is tagged to a particular category using a word frequency module and a glossary word score module to calculate glossary word scores for words that occur in the content tagged to the particular category; and 
                      applying a frequency threshold when generating the glossary and excluding a particular word in the glossary when the particular word does not occur in at least a predetermined number of pieces of content that is tagged to the particular category: 
                    based on probability values in the set of glossaries associated with the set of categories, calculating a set of affinity scores that each represent a degree of relevancy between the particular content item and a category in the set of categories, wherein each glossary associated with a particular category in the set of categories comprises a set of words and a corresponding set of glossary word scores that represent the probability values that a given content items are related to the particular category when the content items contain the word associated with the glossary word score; and 
                  modifying the relevancy score associated with the particular content item based on the calculated set of affinity scores. 
                 |