CPC G06F 16/248 (2019.01) | 20 Claims |
1. A data visualization system for explaining network structures in data, comprising:
a processor; and
a memory, where the memory contains a data visualization application that configures the processor to:
obtain a tabular database comprising:
a plurality of rows; and
a plurality of columns;
extract a network representation of the tabular database, where the network representation comprises:
a plurality of nodes, where each node in the plurality of nodes represents a unique value in a target column in the plurality of columns; and
a plurality of edges, where each edge connects two nodes in the plurality of nodes and reflects a shared value in one or more associative columns in the plurality of columns;
identify communities within the network representation;
add a community column to the tabular database, where values for each row in the community column indicate the community to which that row belongs;
recursively, until a predefined breakpoint is hit:
construct a tree structure for each associative column by partitioning the identified community column values of respective associative columns into each respective tree structure;
calculate a branch disorder value for each branch of each tree structure;
calculate whole-tree disorder for each tree structure based on the calculated branch disorder values;
partition the community column values into branches of the tree having a lowest whole-tree disorder of the calculated whole tree disorders;
extract a plurality of explanatory rules based on the traversal of the tree having the lowest whole-tree disorder of the calculated whole tree disorders; and
provide the plurality of explanatory rules.
|