Databases Reference
In-Depth Information
Figure 2.24
PCP Backside
If desired, you could confirm this independence by loading the Pollen data
into a correlation matrix.
Extracting sub-populations using the parallel coordinate plot
The PCP is an excellent tool to use when visually searching for sub-populations
within a dataset. A good indicator that there are sub-populations within a dataset
can be observed by looking for ribbons of lighter shading in the plot. Do you see
any in the Pollen data of Figure 2.23?
Another indicator is to look at the attribute distributions by rotating slightly to
the right or left in order to see the densities based on height. When you see a
multimodal (multiple peaks) distribution, it is likely that each peak represents a
sub-population present in the full dataset (Figure 2.25).
Drag the Ridge axis to the right end.
Rotate slightly to see two very distinct peaks on that Ridge axis. The tallest
is at about mid-level on the axis where you would expect the peak of a
normal distribution to be located, although the peak is higher than would be
expected of a normal curve. The second peak is about two-thirds of the way
up from the bottom of the axis.
To explore these sub-populations in detail, we need to extract them from the
full dataset. We'll do the tallest peak first - the light colored ribbon that we see
 
Search WWH ::




Custom Search