US 10,891,335 B2
Enhanced exploration of dimensionally reduced data
Marco Cavallo, New York, NY (US); and Cagatay Demiralp, New York, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jan. 3, 2018, as Appl. No. 15/860,719.
Prior Publication US 2019/0205478 A1, Jul. 4, 2019
Int. Cl. G06F 16/901 (2019.01); G06F 17/18 (2006.01); G06T 11/20 (2006.01); G06F 17/15 (2006.01); G06F 17/17 (2006.01); G06K 9/00 (2006.01); G06F 16/26 (2019.01)
CPC G06F 16/9024 (2019.01) [G06F 16/26 (2019.01); G06F 17/15 (2013.01); G06F 17/17 (2013.01); G06F 17/18 (2013.01); G06K 9/00 (2013.01); G06T 11/203 (2013.01); G06T 11/206 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for enhanced exploration of dimensionally reduced data, the method comprising:
obtaining at least one data set having a plurality of data objects, wherein each data object is characterized by a plurality of numerical features;
applying a dimensionality reduction technique to the at least one data set;
generating a two-dimensional scatter plot of the at least one data set, wherein each data object in the plurality of data objects corresponds to a data point in a plurality of data points in the two-dimensional scatter plot;
selecting a data point for a projection line in the plurality of data points in the two dimensional scatter plot;
generating a projection line for each numerical feature of the data object corresponding to the data point for the projection line, wherein a length of the projection line indicates an importance of each numerical feature on the position of the data point, and wherein generating a projection line for each numerical feature of the data object comprises:
calculating the mean, standard deviation, minimum value and maximum value for each numerical feature; and
plotting the projection line;
altering at least one numerical feature of a data object in the plurality of data objects, wherein a position of the data point for the projection line corresponding to the data object is affected in real-time,
and wherein remaining data points in the plurality of data points are unaffected.