Database Reference
In-Depth Information
Figure 3-27. Adding observation filter parameters.
Go ahead and run your model by clicking the play button. In results perspective, you will now see
that your data set has been reduced from eleven observations (or examples) to nine. This is
because the two observations where the Online_Shopping attribute had a missing value have been
removed. You'll be able to see that they're gone by selecting the Data View radio button. They
have not been deleted from the original source data, they are simply removed from the data set at
the point in the stream where the filter operator is located and will no longer be considered in any
downstream data mining operations. In instances where the missing value cannot be safely
assumed or computed, removal of the entire observation is often the best course of action. When
attributes are numeric in nature, such as with ages or number of visits to a certain place, an
arithmetic measure of central tendency, such as mean, median or mode might be an acceptable
replacement for missing values, but in more subjective attributes, such as whether one is an online
shopper or not, you may be better off simply filtering out observations where the datum is missing.
(One cool trick you can try in RapidMiner is to use the Invert Filter option in design perspective.
In this example, if you check that check box in the parameters pane of the Filter Examples
operator, you will keep the missing observations, and filter out the rest.)
Data mining can be confusing and overwhelming, especially when data sets get large. It doesn't
have to be though, if we manage our data well. The previous example has shown how to filter out
observations containing undesired data (or missing data) in an attribute, but we can also reduce
data to test out a data mining model on a smaller subset of our data. This can greatly reduce
Search WWH ::




Custom Search