Database Reference
In-Depth Information
likely to be the most interesting. You may include the athletes' names, however keep in
mind that correlations can only be conducted on numeric data, so the name attribute
would need to be reduced out of your data set before creating your correlation matrix.
(Remember the Select Attributes operator!)
3) Look up the statistics for each of your selected attributes and enter them as observations
into your spreadsheet. Try to find as many as you can— at least thirty is a good rule of
thumb in order to achieve at least a basic level of statistical validity. More is better.
4) Once you've created your data set, use the menu to save it as a CSV file. Click File, then
Save As. Enter a file name, and change 'Save as type:' to be Text CSV (.csv). Be sure to
save the file in your data mining data folder.
5) Open RapidMiner and import your data set into your RapidMiner repository. Name it
Chapter4Exercise, or something descriptive so that you will remember what data are
contained in the data set when you look in your repository.
6) Add the data set to a new process in RapidMiner. Ensure that the out port is connected to
a res port and run your model. Save your process with a descriptive name if you wish.
Examine your data in results perspective and ensure there are no missing, inconsistent, or
other potentially problematic data that might need to be handled as part of your Data
Preparation phase. Return to design perspective and handle any data preparation tasks that
may be necessary.
7) Add a Correlation Matrix operator to your stream and ensure that the mat port is
connected to a res port. Run your model again. Interpret your correlation coefficients as
displayed on the matrix tab.
8) Document your findings. What correlations exist? How strong are they? Are they
surprising to you and if so, why? What other attributes would you like to add? Are there
any you'd eliminate now that you've mined your data?
Search WWH ::




Custom Search