Discriminant Analysis - Data Mining for the Masses

Database Reference

In-Depth Information

Toyota, Honda, Ford, etc.), or you could record it by body style (e.g. Car, Truck,

SUV, etc.). Be consistent in assigning classifications, and note that depending on

the size of the data set you create, you won't want to have too many possible

classificatons, or your predictions in the scoring data set will be spread out too

much. With small data sets containing only 20-30 observations, the number of

categories should be limited to three or four. You might even consider using

Japanese, American, European as your Car_Types values.

5) Once you've compiled your Training data set, switch to the Scoring sheet in OpenOffice

Calc. Repeat the data entry process for at least 20 people (more is better) that you know

who do not have a car. You will use the training set to try to predict the type of car each of

these people would drive if they had one.

6) Use the File > Save As menu option in OpenOffice Calc to save your Training and Scoring

sheets as CSV files.

7) Import your two CSV files into your RapidMiner respository. Be sure to give them

descriptive names.

8) Drag your two data sets into a new process window. If you have prepared your data well

in OpenOffice Calc, you shouldn't have any missing or inconsistent data to contend with,

so data preparation should be minimal. Rename the two retrieve operators so you can tell

the difference between your training and scoring data sets.

9) One necessary data preparation step is to add a Set Role operator and define the Car_Type

attribute as your label.

10) Add a Linear Discriminant Analysis operator to your Training stream.

11) Apply your LDA model to your scoring data and run your model. Evaluate and report

your results. Did you get any confidence percentages? Do the predicted Car_Types seem

reasonable and consistent with your training data? Why or why not?

Search WWH ::

Custom Search

Home