| CPC G06T 1/20 (2013.01) [G06F 9/3851 (2013.01)] | 5 Claims |

|
1. A graphics processing unit (GPU)-based logic rewriting acceleration method, comprising parallelizing sub-procedures of And-Inverter Graph (AIG)-based logic rewriting, wherein the parallelizing sub-procedures of AIG-based logic rewriting comprises the following steps:
on a central processing unit (CPU), asynchronously selecting, by a scheduler, a group of same-level nodes from an input AIG-based logic network; then copying the nodes to a memory of a GPU, such that the GPU sequentially enables a kernel to rewrite the nodes in parallel, and selecting, by the CPU, another group of nodes for the GPU to rewrite, which completely overlaps scheduling and rewriting procedures; and on the GPU, completing parallel node rewriting through cut enumeration, maximum fan-out-free cone (MFFC) computation, evaluation, and replacement, wherein
the cut enumeration adopts a parallel cut enumeration algorithm designed based on a subgraph level, to simultaneously process the same-level nodes from a low level to a high level; nodes in a subgraph used for replacement in a previous scheduling cycle are added to different LAN[slevel] lists based on a level of the subgraph, wherein slevel represents a local level of a node in the subgraph used for replacement, cuts of nodes in a same LAN[slevel] list are computed in parallel, and then Boolean functions corresponding to the cuts are computed; and
the MFFC computation adopts a top-down computation mode to parallelize a recursive MFFC computation algorithm, wherein starting from a cut list on which the MFFC computation requires to be performed, each MFFC computation of a cut is allocated to a thread block, threads in the same thread block collaborate to compute an MFFC of the cut, and after the thread block obtains the corresponding cut, following steps are performed:
step 1: adding a root node of a current cut to a corresponding MFFC set of the root node, wherein the MFFC is a set that stores all last MFFC nodes;
step 2: adding left and right children of the root node to a buffering region F1; extracting, by each thread, a node n from the buffering region F1; if all fan-outs of the node n are in a current MFFC set, evaluating a function IS_MFFC_NODE(n) as true; and in this case, adding all the fan-outs of the node n to a buffering region F2, and adding the node n to the MFFC set;
step 3: repeating the step 2 until all nodes in the buffering region F1 are processed, and then swapping roles of the F1 and the F2; and
step 4: processing one leaf node of the current cut by using a method that is the same as a method described in the steps 2 to 3, until all leaf nodes of the current cut are processed;
in a replacement stage, a lockless replacement algorithm is used to avoid lots of data contention in frequently modifying a logic graph, wherein to resolve a conflict that different threads with overlapping MFFCs delete a same node, a node scheduler is used to group nodes with non-overlapping MFFCs; and
to resolve a conflict that MFFCs possessed by different threads do not overlap, but one thread wants to share a node to be deleted by another thread, a replacement process is divided into two stages: in a first stage, each block of the GPU deletes an MFFC and constructs a new subgraph; and in a second stage, each block of the GPU combines all equivalent nodes into a hyper-node that supports concurrent deletion/creation operations for one node without introducing a lock between different threads.
|