Data Preparation - Data Mining for the Masses

Database Reference

In-Depth Information

Figure 3-27. Adding observation filter parameters.

Go ahead and run your model by clicking the play button. In results perspective, you will now see

that your data set has been reduced from eleven observations (or examples) to nine. This is

because the two observations where the Online_Shopping attribute had a missing value have been

removed. You'll be able to see that they're gone by selecting the Data View radio button. They

have not been deleted from the original source data, they are simply removed from the data set at

the point in the stream where the filter operator is located and will no longer be considered in any

downstream data mining operations. In instances where the missing value cannot be safely

assumed or computed, removal of the entire observation is often the best course of action. When

attributes are numeric in nature, such as with ages or number of visits to a certain place, an

arithmetic measure of central tendency, such as mean, median or mode might be an acceptable

replacement for missing values, but in more subjective attributes, such as whether one is an online

shopper or not, you may be better off simply filtering out observations where the datum is missing.

(One cool trick you can try in RapidMiner is to use the Invert Filter option in design perspective.

In this example, if you check that check box in the parameters pane of the Filter Examples

operator, you will keep the missing observations, and filter out the rest.)

Data mining can be confusing and overwhelming, especially when data sets get large. It doesn't

have to be though, if we manage our data well. The previous example has shown how to filter out

observations containing undesired data (or missing data) in an attribute, but we can also reduce

data to test out a data mining model on a smaller subset of our data. This can greatly reduce

Search WWH ::

Custom Search

Home