Databases Reference
In-Depth Information
is 249 (the number in the Sardinia and North regions combined). Note the
position of any sliders that you adjusted to eliminate the South region
observations. The positions of these sliders become a classification rule.
Now hide the South region filter group in the first PCP in order to focus on
the other two. Again using the first PCP as a guide, adjust the sliders in the
second PCP to eliminate all observations of one region while leaving all
observations of the other region visible. Once complete, the positions of
these sliders become the second classification rule.
c. Working with the first PCP, the one with the three filter groups, hide the
North and South region filter groups leaving only the Sardinia visible.
Looking at the linoleic axis, you see two distinct sub-populations. How are
these sub-populations related to area?
To explore the data distribution assessment features of the PCP:
Close the PCP for the Iris data.
Open the file Pollen.csv.
View its “Summary Statistics”.
The Pollen dataset contains five measures of grains of pollen: Crack, Density,
Nub, Ridge, and Weight. There are almost four thousand observations. The
dataset is actually a synthetic dataset created for use in a data mining
competition. The objective of the competition was to find significant sub-
populations within the dataset. We will use it to assess data distributions.
View the Pollen data in a PCP by dragging its dataset icon up to a display
and selecting “Parallel Plot”.
The initial PCP of the Pollen data (Figure 2.22) looks quite different from
that of the Iris data due to its much larger observation count (3838 compared to
150). Only a few of the individual observation line segments are distinguish-
able. To assist in evaluating the distributions of each attribute, the densities
are color encoded. The lighter shades of red, transitioning toward yellow,
indicate areas of greater observation density. In VisMiner, the PCP is actually a
3-D plot. To view:
Rotate the plot to the right or left by dragging, and you will see that the
areas of greater density, in addition to being color encoded are also raised
up from the surface of the plot. By rotating about 20 degrees to the left, you
can clearly see the distributions of the Crack attribute at the left end and the
Weight attribute at the right end.
Search WWH ::




Custom Search