Databases Reference
In-Depth Information
introduced at this time to allow you to compare this method with other filtering
options available in VisMiner using the parallel plot and location plot viewers.
For most datasets, the parallel plot is the recommended tool to use when
generating filtered subsets. It has the advantage of visual feedback as observa-
tions are filtered out. Control Center filtering can be more effective when
applied to nominal data types. For example, in the just completed practice
example, only homes in the Alpine and Provo school districts were selected.
Such a selection is not possible when filtering via the parallel plot where
filtering is specified using the sliders, thus requiring that values to be filtered out
are limited by adjacency in the plot. Because the nominal values are listed
alphabetically, it would be impossible to keep the Alpine and Provo observa-
tions while eliminating the Nebo observations.
Exercise 3.1
Use the CmpltHomes.csv dataset prepared in the previous tutorial.
a. Look for patterns in the relationship between location and year built. What
areas have mostly newer homes?
b. When evaluating the relationship between lot size and location, as with
price, the few very large lot homes (up to 200 acres) dominate the color
encoding. To use the range sliders alone to restrict the selection lacks
precision because over 90% of the homes are on less than one acre lots, yet
the range slider moves in one acre increments. Thus, in moving the left
range “Lot” slider you can't gradually reduce the smaller lot homes. At zero,
they are all there, then at the next slider position, they are gone. Use the
parallel coordinate plot to first create a subset of the homes having lot sizes
less than two acres, then use the location plot to evaluate the relationship
between lot size and location.
c.
In many areas, proximity to a lake increases a home's value. Does this
appear to be the case for the Provo Metropolitan area homes? What
geographic setting appears to add value to a home in this area?
Dataset preparation - creating computed columns
Occasionally, needed or desirable columns in a dataset are not included, but
may be computed using other values in the set. For example, suppose that in the
CmpltHomes.csv dataset, the relationship between price and location is to be
explored. However, looking at price alone is not sufficient, since price is mostly
determined by the size of the home. A possible measure representing both price
 
Search WWH ::




Custom Search