Databases Reference
In-Depth Information
Distribution assessment
Control Center summary statistics - contains minimum, maximum, mean, and
standard deviations for all numeric attributes. For discrete attributes (text and
integer), it provides cardinalities (number of unique values) with distribution
counts shown in pop-up as the cardinality cell is hovered.
Histogram - selectively creates histograms of each attribute in dataset.
Parallel coordinate plot - observation densities are both color encoded and
plotted as heights along Z dimension.
Scatter plot - distributions become visible when same attribute is chosen for
both X and Y axes.
Pattern/relationship search
The ultimate purpose of data mining is to find previously unknown relationships
or patterns between attributes. The search begins during initial exploration when
patterns of potential interest are first identified. In later data mining steps, a more
in-depth analysis of these patterns is conducted using the data mining algorithms.
Viewers supporting pattern search during initial exploration include:
Correlation matrix - displays color encoded correlations between all
attribute pairs. It visually draws attention to related attributes.
Correlation matrix synchronized with scatter plot - when the same dataset is
viewed using both the correlation matrix and scatter plot, the axes of the
scatter plot are synchronized with attribute pair selections in the correlation
matrix. Use the correlation matrix to quickly identify correlated attributes,
then methodically click on each cell of interest in the correlation matrix to
show the relationship in the scatter plot. With each selection you may want
to add a third Z axis attribute to the plot for more complex pattern searches.
If the objective is classification, select the classification attribute as the
scatter plot “Category” attribute for color encoding.
Parallel coordinate plot - although the PCP is best used for subset recognition
and extraction, it also supports pattern searches. Crossing line patterns
between adjacent axes represent inverse correlation, while nearly parallel
line patterns represent direct correlations. To assess patterns between non-
adjacent axes, drag one axis toward the other until they are adjacent. If
the objective is classification, create filters for each of the classification
attribute values. Note which attributes best discriminate between filters. If
there are too many observations, and the plot becomes so cluttered that it
inhibits accurate interpretation, check the “Show means” box.
Search WWH ::




Custom Search