Database Reference
In-Depth Information
Keep in mind that not all inconsistent data is going to be as easy to handle as replacing a single
value. It would be entirely possible that in addition to the inconsistent value of 99, values of 87,
96, 101, or others could be present in a data set. If this were the case, it might take multiple
replacements and/or missing data operators to prepare the data set for mining. In numeric data
we might also come across data which are accurate, but which are also statistical outliers. These
might also be considered to be inconsistent data, so an example in a later chapter will illustrate the
handling of statistical outliers. Sometimes data scrubbing can become tedious, but it will ultimately
affect the usefulness of data mining results, so these types of activities are important, and attention
to detail is critical.
ATTRIBUTE REDUCTION
In many data sets, you will find that some attributes are simply irrelevant to answering a given
question. In Chapter 4 we will discuss methods for evaluating correlation, or the strength of
relationships between given attributes. In some instances, you will not know the extent to which a
certain attribute will be useful without statistically assessing that attribute's correlation to the other
data you will be evaluating. In our process stream in RapidMiner, we can remove attributes that
are not very interesting in terms of answering a given question without completely deleting them
from the data set. Remember, simply because certain variables in a data set aren't interesting for
answering a certain question doesn't mean those variables won't ever be interesting. This is why
we recommended bringing in all attributes when importing the Chapter 3 data set earlier in this
chapter—uninteresting or irrelevant attributes are easy to exclude within your stream by following
these steps:
1) Return to design perspective. In the operator search field, type Select Attribute. The
Select Attributes operator will appear. Drag it onto the end of your stream so that it fits
between the Replace operator and the result set port. Your window should look like
Figure 3-32.
 
Search WWH ::




Custom Search