US 12,248,439 B2
	Affinity scoring
Ashutosh Joshi, Fremont, CA (US); Martin Betz, Palo Alto, CA (US); David Cooke, Los Altos, CA (US); Rajiv Arora, Gurgaon (IN); Binay Mohanty, New Delhi (IN); and Ansuman Mishra, New Delhi (IN)
Assigned to Aurea Software, Inc., Austin, TX (US)
Filed by Aurea Software, Inc., Austin, TX (US)
Filed on Feb. 11, 2020, as Appl. No. 16/788,149.
Application 16/788,149 is a continuation of application No. 13/754,856, filed on Jan. 30, 2013, granted, now 10,592,480.
Claims priority of provisional application 61/757,133, filed on Jan. 26, 2013.
Claims priority of provisional application 61/747,345, filed on Dec. 30, 2012.
Prior Publication US 2020/0183893 A1, Jun. 11, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/21 (2019.01)

CPC G06F 16/21 (2019.01)

18 Claims

1. A system comprising:

one or more processors; and

a memory storing code stored therein for determining a relatedness of content items to categories, wherein when executed the code causes the one or more processors to perform operations comprising:

pre-processing content to obtain information in the content, classify the content as pertaining to one or more categories based on the information, and organizing the content in terms of relevancy to categories;

identifying a particular content item of the pre-processed content, a relevancy score associated with the particular content item, and a set of categories to which the particular content item is classified as related;

generating glossaries associated with the set of categories, wherein generating each glossary of the glossaries comprises:

using a glossary manager:

directing a word stemming module to stem words in a collection of content to reduce words to a base form;

receiving the collection of content with the words stemmed;

identifying business content that is tagged to a particular category using a word frequency module and a glossary word score module to calculate glossary word scores for words that occur in the content tagged to the particular category; and

applying a frequency threshold when generating the glossary and excluding a particular word in the glossary when the particular word does not occur in at least a predetermined number of pieces of content that is tagged to the particular category:

based on probability values in the set of glossaries associated with the set of categories, calculating a set of affinity scores that each represent a degree of relevancy between the particular content item and a category in the set of categories, wherein each glossary associated with a particular category in the set of categories comprises a set of words and a corresponding set of glossary word scores that represent the probability values that a given content items are related to the particular category when the content items contain the word associated with the glossary word score; and

modifying the relevancy score associated with the particular content item based on the calculated set of affinity scores.