US 12,147,948 B2
Systems and methods for determination, description, and use of feature sets for machine learning classification systems, including electronic messaging systems employing machine learning classification
Peter Gallagher McNeil, Leesburg, VA (US)
Assigned to ZIX CORPORATION, Dallas, TX (US)
Filed by Zix Corporation, Dallas, TX (US)
Filed on Dec. 4, 2023, as Appl. No. 18/528,423.
Application 18/528,423 is a continuation of application No. 18/160,496, filed on Jan. 27, 2023, granted, now 11,887,061.
Prior Publication US 2024/0257060 A1, Aug. 1, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 15/16 (2006.01); G06Q 10/107 (2023.01); G09B 5/02 (2006.01); G09B 19/00 (2006.01); H04L 51/21 (2022.01)
CPC G06Q 10/107 (2013.01) [H04L 51/21 (2022.05)] 21 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving a corpus of training content, wherein the training content comprises a plurality of training emails;
generating a first value for each of a plurality of features, wherein the plurality of features include at least one feature associated with at least a portion of an email address including one or more of a domain segment, a local segment or a friendly segment;
generating a feature descriptor for each of the plurality of features based at least one attribute of an associated feature for the feature descriptor;
generating a first classification value for each of a plurality of cross-tabulations including the plurality of features based on the first values for each of the plurality of features;
receiving a corpus of live content;
parsing the live content based on a plurality of feature descriptors for the plurality of features to identify one or more of the features in the live content;
generating a second value for each of the identified one or more features in the live content;
generating a second classification value for one or more of the plurality of cross-tabulations including the identified one or more features; and
based on the generated second classification value for the one or more cross-tabulations including the identified one or more features, classifying the live content.