Database Reference
In-Depth Information
Toyota, Honda, Ford, etc.), or you could record it by body style (e.g. Car, Truck,
SUV, etc.). Be consistent in assigning classifications, and note that depending on
the size of the data set you create, you won't want to have too many possible
classificatons, or your predictions in the scoring data set will be spread out too
much. With small data sets containing only 20-30 observations, the number of
categories should be limited to three or four. You might even consider using
Japanese, American, European as your Car_Types values.
5) Once you've compiled your Training data set, switch to the Scoring sheet in OpenOffice
Calc. Repeat the data entry process for at least 20 people (more is better) that you know
who do not have a car. You will use the training set to try to predict the type of car each of
these people would drive if they had one.
6) Use the File > Save As menu option in OpenOffice Calc to save your Training and Scoring
sheets as CSV files.
7) Import your two CSV files into your RapidMiner respository. Be sure to give them
descriptive names.
8) Drag your two data sets into a new process window. If you have prepared your data well
in OpenOffice Calc, you shouldn't have any missing or inconsistent data to contend with,
so data preparation should be minimal. Rename the two retrieve operators so you can tell
the difference between your training and scoring data sets.
9) One necessary data preparation step is to add a Set Role operator and define the Car_Type
attribute as your label.
10) Add a Linear Discriminant Analysis operator to your Training stream.
11) Apply your LDA model to your scoring data and run your model. Evaluate and report
your results. Did you get any confidence percentages? Do the predicted Car_Types seem
reasonable and consistent with your training data? Why or why not?
Search WWH ::




Custom Search