| CPC G06F 3/0482 (2013.01) [G06F 3/04845 (2013.01); G06F 8/38 (2013.01); G06F 16/254 (2019.01); G06F 16/26 (2019.01); G06T 11/206 (2013.01); G06F 9/451 (2018.02); G06F 2203/04803 (2013.01)] | 20 Claims |

|
1. A method of preparing data for subsequent analysis, comprising:
at a computer system having a display, one or more processors, and memory storing one or more programs configured for execution by the one or more processors:
displaying a user interface that includes a concurrent display of a data flow pane, a profile pane, and a data pane, including displaying, in the data flow pane, a flow diagram including a plurality of linked nodes, each node specifying (i) a respective operation performed on respective data of a selected data source and (ii) a respective intermediate data set of transformed data that is generated upon execution of the respective operation, wherein at least some of the transformed data is distinct from the respective data of the selected data source;
receiving user selection of one or more nodes, from the plurality of linked nodes, in the flow diagram;
in response to receiving the user selection:
displaying, in the profile pane, schemas corresponding to the selected one or more nodes, including (1) data elements showing information about data fields of one or more intermediate data sets of transformed data, corresponding to the selected one or more nodes, and (2) statistical information about data values for the data fields:
determining a sample of data rows corresponding to the one or more intermediate data sets of transformed data; and
displaying, in the data pane, the sample of data rows corresponding to the one or more intermediate data sets of transformed data;
receiving, via the profile pane, a first user interaction to modify a first data element of the data elements in the profile pane; and
in response to receiving the first user interaction via the profile pane, modifying the flow diagram displayed in the data flow pane, including:
identifying an operation to perform that encapsulates the first user interaction; and
updating the selected one or more nodes, or adding a new node, to include the operation in the flow diagram.
|