US 11,055,351 B1
Frequent pattern mining on a frequent hierarchical pattern tree
Chi-Ren Shyu, Columbia, MO (US); and Michael Phinney, Columbia, MO (US)
Assigned to The Curators of the University of Missouri, Columbia, MO (US)
Filed by The Curators of the University of Missouri, Columbia, MO (US)
Filed on Apr. 17, 2018, as Appl. No. 15/955,462.
Claims priority of provisional application 62/486,119, filed on Apr. 17, 2017.
Int. Cl. G06F 16/00 (2019.01); G06F 16/901 (2019.01); G06F 16/2458 (2019.01); G06F 16/28 (2019.01); G06F 16/22 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/9027 (2019.01) [G06F 16/2246 (2019.01); G06F 16/2379 (2019.01); G06F 16/2465 (2019.01); G06F 16/285 (2019.01); G06F 2216/03 (2013.01)] 12 Claims
OG exemplary drawing
1. A method of building a tree structure to support efficient mining of frequent patterns with respect to a data set, the method comprising:
a processor scanning a data set, the data set comprising a plurality of items arranged as a plurality of transactions, wherein the scanning comprises the processor identifying each item that appears in the data set with at least a minimum frequency;
a processor creating a plurality of leaf nodes for a hierarchical tree structure based on the identified items such that each leaf node corresponds to a different identified item, and wherein each leaf node is associated with the transactions from the data set that have the corresponding item as a member; and
a processor creating a plurality of higher level intermediate nodes for the hierarchical tree structure based on agglomerative clustering that builds out the hierarchical tree structure from the leaf nodes upward such that (1) each intermediate node has a plurality of child nodes, wherein the child nodes are non-overlapping with respect to their corresponding items, (2) each intermediate node corresponds to the items that correspond to its child nodes, (3) each intermediate node is associated with a transaction set corresponding to the transactions associated with the corresponding items of its child nodes, and (4) a similarity measure between child nodes and a corresponding threshold is used to identify child nodes that are grouped as sibling nodes under an intermediate node, wherein the similarity measure measures a similarity with respect to the corresponding transaction sets of the child nodes.