US 11,928,606 B2
Systems and methods for classifying electronic documents
Caimei Lu, Kirkwood, MO (US); Ricky John Prosser, St. Charles, MO (US); and Michael Wayne Bryars, Chesterfield, MO (US)
Assigned to TSG Technologies, LLC, Brentwood, MO (US)
Filed by TSG Technologies, LLC, St. Louis, MO (US)
Filed on Feb. 3, 2020, as Appl. No. 16/780,413.
Application 16/780,413 is a continuation of application No. 15/629,332, filed on Jun. 21, 2017, granted, now 10,579,646.
Application 15/629,332 is a continuation of application No. 15/069,661, filed on Mar. 14, 2016, granted, now 9,710,540, issued on Jul. 18, 2017.
Application 15/069,661 is a continuation of application No. 13/839,817, filed on Mar. 15, 2013, granted, now 9,298,814, issued on Mar. 29, 2016.
Prior Publication US 2021/0241136 A1, Aug. 5, 2021
Int. Cl. G06N 5/04 (2023.01); G06F 16/81 (2019.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 16/81 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium having instructions stored thereon that, when executed by at least one processor, cause the at least one processor to perform operations for creating a plurality of classification rules to classify an electronic document from an electronic media source, the operations comprising:
generating statistical data from one or more training documents, and creating a plurality of classification rules, including creating at least one topic model-based classification rule using the statistical data, the at least one topic model-based classification rule formatted as an XML file;
evaluating the at least one topic model-based classification rule using a precision equation and a recall equation, the precision equation comprising

OG Complex Work Unit Math
 and the recall equation comprising

OG Complex Work Unit Math
 wherein N(dc+) is a number of test documents correctly classified to a category C based on the at least one topic model-based classification rule, N(dc−) is a number of test documents incorrectly classified to the category C based on the at least one topic model-based classification rule, and N(dc) denotes a number of test documents that should be classified to the category C; and
creating at least one query-based classification rule using one or more user defined categories and the statistical data, the at least one query based classification rule formatted as an XML file.