Discriminant Analysis - Data Mining for the Masses

Database Reference

In-Depth Information

MODELING

8) We now have a functional stream. Go ahead and run the model as it is now. With the mod

port connected to the res port, RapidMiner will generate Discriminant Analysis output for

us.

Figure 7-8. The results of discriminant analysis on our training data set.

9) The probabilities given in the results will total to 1. This is because at this stage of our

Discriminant Analysis model, all that has been calculated is the likelihood of an observation

landing in one of the four categories in our target attribute of Prime_Sport. Because this is

our training data set, RapidMiner can calculate theses probabilities easily—every

observation is already classified. Football has a probability of 0.3237. If you refer back to

Figure 7-2, you will see that Football as Prime_Sport comprised 160 of our 493

observations. Thus, the probability of an observation having Football is 160/493, or

0.3245. But in steps 3 and 4 (Figures 7-3 and 7-4), we removed 11 observations that had

inconsistent data in their Decision_Making attribute. Four of these were Football

observations (Figure 7-4), so our Football count dropped to 156 and our total count

dropped to 482: 156/482 = 0.3237. Since we have no observations where the value for

Prime_Sport is missing, each possible value in Prime_Sport will have some portion of the

total count, and the sum of these portions will equal 1, as is the case in Figure 7-8. These

probabilities, coupled with the values for each attribute, will be used to predict the

Prime_Sport classification for each of Gill's current clients represented in our scoring data

set. Return now to design perspective and in the Repositories tab, drag the Chapter 7

scoring data set over and drop it in the main process window. Do not connect it to your

Search WWH ::

Custom Search

Home