US 11,886,513 B2
Data analysis system, data analysis method, and computer program product
Hidenori Matsuzaki, Fuchu (JP); and Xinxiao Li, Yokohama (JP)
Assigned to Kabushiki Kaisha Toshiba, Minato-ku (JP)
Filed by Kabushiki Kaisha Toshiba, Minato-ku (JP)
Filed on Aug. 28, 2018, as Appl. No. 16/114,345.
Claims priority of application No. 2018-041097 (JP), filed on Mar. 7, 2018.
Prior Publication US 2019/0278871 A1, Sep. 12, 2019
Int. Cl. G06F 16/904 (2019.01); G06F 16/901 (2019.01); G06T 11/20 (2006.01); G06F 40/18 (2020.01)
CPC G06F 16/904 (2019.01) [G06F 16/901 (2019.01); G06F 40/18 (2020.01); G06T 11/206 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A data analysis system comprising:
a memory; and
one or more processors coupled to the memory, the one or more processors being configured to:
provide a graphical interface configured to select a time column and a feature column from a plurality of columns contained in a first data set and an attribute value range to be extracted from a plurality of attribute values contained in the selected time column;
extract, from the first data set, a second data set corresponding to the selected time and feature columns and the attribute value range;
select a unit of aggregation for the extracted second data set, wherein a number of attribute values contained in the unit of aggregation is smaller than a number of attribute values contained in the attribute value range;
group attribute values contained in the second data set by the unit of aggregation to generate a plurality of groups for the second data set, wherein a number of the plurality of groups is smaller than the number of attribute values contained in the attribute value range;
perform, for each of the generated plurality of groups and based on a predetermined parameter related to a restriction on a resource of the system, an aggregation process on attribute values contained in a corresponding group to generate a third data set for the plurality of groups, wherein the third data set has a reduced data amount relative to the second data set;
store the reduced third data set in the memory;
analyze the third data set stored in the memory to generate an analysis result;
perform clustering processing on the reduced third data to generate a plurality of cluster objects each corresponding to a respective range of time in the third data set;
display a data image provided by visualizing the third data set and an analysis result image provided by visualizing the analysis result of the third data set, the data image displaying the cluster objects;
determine whether one or more of the cluster objects corresponds to a singular object; and
display the singular object on the data image, the singular object corresponding to, among the plurality of groups, a group of which a first data number indicating a number of pieces of data used for performing the aggregation process is equal to or larger than a predetermined value, the singular object being displayed on the data image with a display format different from that of another cluster object,
change the selected column or the attribute value range via the graphical interface; and
update the data image and the analysis result image in a linked manner in accordance with the change to the selected column or the attribute value range, wherein
the plurality of attribute values of the time column in the first data set are not aligned at equal intervals so that, among the plurality of groups, a number of attribute values contained in one group is different from a number of attribute values of another group of the plurality of groups.