CPC G06N 5/04 (2013.01) [G06F 16/81 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A non-transitory computer readable medium having instructions stored thereon that, when executed by at least one processor, cause the at least one processor to perform operations for creating a plurality of classification rules to classify an electronic document from an electronic media source, the operations comprising:
generating statistical data from one or more training documents, and creating a plurality of classification rules, including creating at least one topic model-based classification rule using the statistical data, the at least one topic model-based classification rule formatted as an XML file;
evaluating the at least one topic model-based classification rule using a precision equation and a recall equation, the precision equation comprising
and the recall equation comprising
wherein N(dc+) is a number of test documents correctly classified to a category C based on the at least one topic model-based classification rule, N(dc−) is a number of test documents incorrectly classified to the category C based on the at least one topic model-based classification rule, and N(dc) denotes a number of test documents that should be classified to the category C; and
creating at least one query-based classification rule using one or more user defined categories and the statistical data, the at least one query based classification rule formatted as an XML file.
|