| CPC G06N 7/01 (2023.01) [G06N 5/047 (2013.01); G06V 10/955 (2022.01)] | 31 Claims |

|
1. A processor for discovering a plurality of hierarchical patterns in datasets, the processor comprises a plurality of functional elements comprising:
a plurality of state transition elements; and
a plurality of counters,
wherein the processor is capable of a replacement of symbol sets of the plurality of state transition elements and threshold values of the plurality of counters,
wherein the plurality of counters are configured to work with the plurality of state transition elements to increase space efficiency of automata implementation,
wherein the plurality of hierarchical patterns include continuous or discontinuous sequences of sets, continuous or discontinuous sequences of sequences, or sets of continuous or discontinuous sequences in the datasets,
wherein a plurality of original hierarchical patterns are defined as sequential patterns, each sequential pattern is composed of a collection of itemsets, an element of the sequential pattern is formed by an assortment of items selected from a predefined set of items, and one item occurs just once in one itemset;
wherein the plurality of original hierarchical patterns are flattened into strings by adding and using delimiters and place-holders to reduce automata design space for candidate sequential patterns and to a discontinuous sequence-matching problem,
wherein the delimiters are configured to bound and connect the strings,
wherein an itemset delimiter is added between two adjacent strings converted from two adjacent itemsets in a sequence,
wherein the plurality of state transition elements includes place-holder state transition elements configured to represent the place-holders, the place-holders including an itemset position holder and an item position holder,
wherein the plurality of state transition elements further includes item state transition elements configured to represent the items in the predefined set of items, and each item in the predefined set of items has a corresponding place-holder,
wherein the first place-holder corresponding to the first item in each itemset is the itemset position holder, and any other place-holders corresponding to items except the first item are item position holders,
wherein, during a sequential pattern matching, a place-holder state transition element representing the itemset position holder is configured to stay activated before the end of a sequence, and a place-holder state transition element representing the item position holder is configured to stay activated only within an input itemset,
wherein the place-holder state transition element representing the itemset position holder is configured to match symbols including all items in the predefined set of items and the itemset delimiter, and the place-holder state transition element representing the item position holder is configured to match symbols including all items in the predefined set of items except the delimiters,
wherein template automata for discovering the plurality of hierarchical patterns in the datasets is compiled before runtime and replicated to make a full use of the capacity and parallelism of the processor, and
wherein the processor is configured to reduce the automata design space of the candidate sequential patterns of a given length from multiple patterns to a single pattern template for pre-compiling a library of automata for each level.
|