US 11,775,596 B1
Models for classifying documents
Ashutosh Joshi, Fremont, CA (US); Martin Betz, Palo Alto, CA (US); Rajiv Arora, Gurgaon (IN); Rakesh Kumar Srivastava, New Delhi (IN); and David Cooke, Los Altos, CA (US)
Assigned to Aurea Software, Inc., Austin, TX (US)
Filed by Aurea Software, Inc., Austin, TX (US)
Filed on Apr. 21, 2022, as Appl. No. 17/726,482.
Application 17/726,482 is a continuation of application No. 16/691,963, filed on Nov. 22, 2019, granted, now 11,341,194.
Application 16/691,963 is a continuation of application No. 15/662,271, filed on Jul. 27, 2017, granted, now 10,489,441, issued on Nov. 26, 2019.
Application 15/662,271 is a continuation of application No. 12/772,168, filed on Apr. 30, 2010, granted, now 9,760,634, issued on Sep. 12, 2017.
Claims priority of provisional application 61/316,824, filed on Mar. 23, 2010.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/951 (2019.01); G06F 7/08 (2006.01); G06F 16/35 (2019.01)
CPC G06F 16/951 (2019.01) [G06F 7/08 (2013.01); G06F 16/353 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A method for defining and utilizing a content relevance model for a particular category to at least in part determine whether a content segment is relevant to the particular category, the method comprising:
executing code by a processor for a first computer system to cause the processor of the first computer system to perform operations comprising:
sending a first set of content segments that contain content relevant to the particular category and a second set of content segments that contain content not relevant to the particular category; and
receiving, from a second computer system, documents that are relevant to one or more particular categories, wherein the documents are identified by a second computer system executing code by a processor to perform operations comprising:
receiving the first set of content segments;
identifying a set of key word sets more likely to appear in the first set of content segments than the second set of content segments; and
defining a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments, wherein defining the content relevance model further comprises:
determining the set of key word sets for the particular category based on an analysis of (i) a first set of content segments defined as relevant to the particular category and (ii) a second set of content segments defined as not relevant to the particular category;
determining (i) a set of pairs of word sets that each comprise a key word set and a word set that appears in a defined context of the keyword and (ii) a score for each of the word set pairs, the score for a particular word set pair quantifying a likelihood that a content segment containing the word set pair is relevant to the particular category; and
defining a content relevance model for the particular category, the content relevance model comprising (i) a context definition and (ii) the set of word set pairs and corresponding scores;
utilizing the content relevance model in a system to identify content segments in documents for relevancy to the one or more particular categories;
providing the documents that are relevant to the one or more particular categories to the first computer system.