Database Reference
In-Depth Information
processing time while testing a model to see if it will work to answer our questions. Follow the
steps below to take a sample of our data set in RapidMiner.
1) Using the search techniques previously demonstrated, use the Operators search feature to
find an operator called 'Sample' and add this to your stream. In the parameters pane, set
the sample to be to be a 'relative' sample, and then indicate you want to retain 50% of your
observations in the resulting data set by typing .5 into the sample ratio field. Your window
should look like Figure 3-28.
Figure 3-28. Taking a 50% random sample of the data set.
2) When you run your model now, you will find that your results only contain four or five
observations, randomly selected from the nine that were remaining after our filter operator
removed records that had missing Online_Shopping values.
Thus you can see that there are many ways, and various reasons to reduce data by decreasing the
number of observations in your data set. We'll now move on to handling inconsistent data, but
before doing so, it is going to be important to reset our data back to its original form . While
filtering, we removed an observation that we will need in order to illustrate what inconsistent data
is, and to demonstrate how to handle it in RapidMiner. This is a good time to learn how to
remove operators from your stream. Switch back to design perspective and click on your
Sampling operator. Next, right click and choose Delete, or simply press the Delete key on your
Search WWH ::




Custom Search