Database Reference
In-Depth Information
Figure 3-21. Results perspective for the Chapter3 data set.
17) You can toggle between design and results perspectives using the two icons indicated by
the black arrows in Figure 3-21. As you can see, there is a rich set of information in results
perspective. In the meta data view, basic descriptive statistics are given. It is here that we
can also get a sense for the number of observations that have missing values in each
attribute of the data set. The columns in meta data view can be stretched to make their
contents more readable. This is accomplished by hovering your mouse over the faint
vertical gray bars between each column, then clicking and dragging to make them wider.
The information presented here can be very helpful in deciding where missing data are
located, and what to do about it. Take for example the Online_Gaming attribute. The
results perspective shows us that we have six 'N' responses in that attribute, two 'Y'
responses, and three missing. We could use the mode , or most common response to
replace the missing values. This of course assumes that the most common response is
accurate for all observations, and this may not be accurate. As data miners, we must be
responsible for thinking about each change we make in our data, and whether or not we
threaten the integrity of our data by making that change. In some instances the
consequences could be drastic. Consider, for instance, if the mode for an attribute of
Felony_Conviction were 'Y'. Would we really want to convert all missing values in this
attribute to 'Y' simply because that is the mode in our data set? Probably not; the
Search WWH ::




Custom Search