Database Reference
In-Depth Information
MODELING
8) We now have a functional stream. Go ahead and run the model as it is now. With the mod
port connected to the res port, RapidMiner will generate Discriminant Analysis output for
us.
Figure 7-8. The results of discriminant analysis on our training data set.
9) The probabilities given in the results will total to 1. This is because at this stage of our
Discriminant Analysis model, all that has been calculated is the likelihood of an observation
landing in one of the four categories in our target attribute of Prime_Sport. Because this is
our training data set, RapidMiner can calculate theses probabilities easily—every
observation is already classified. Football has a probability of 0.3237. If you refer back to
Figure 7-2, you will see that Football as Prime_Sport comprised 160 of our 493
observations. Thus, the probability of an observation having Football is 160/493, or
0.3245. But in steps 3 and 4 (Figures 7-3 and 7-4), we removed 11 observations that had
inconsistent data in their Decision_Making attribute. Four of these were Football
observations (Figure 7-4), so our Football count dropped to 156 and our total count
dropped to 482: 156/482 = 0.3237. Since we have no observations where the value for
Prime_Sport is missing, each possible value in Prime_Sport will have some portion of the
total count, and the sum of these portions will equal 1, as is the case in Figure 7-8. These
probabilities, coupled with the values for each attribute, will be used to predict the
Prime_Sport classification for each of Gill's current clients represented in our scoring data
set. Return now to design perspective and in the Repositories tab, drag the Chapter 7
scoring data set over and drop it in the main process window. Do not connect it to your
 
Search WWH ::




Custom Search