Correlation - Data Mining for the Masses

Database Reference

In-Depth Information

likely to be the most interesting. You may include the athletes' names, however keep in

mind that correlations can only be conducted on numeric data, so the name attribute

would need to be reduced out of your data set before creating your correlation matrix.

(Remember the Select Attributes operator!)

3) Look up the statistics for each of your selected attributes and enter them as observations

into your spreadsheet. Try to find as many as you can— at least thirty is a good rule of

thumb in order to achieve at least a basic level of statistical validity. More is better.

4) Once you've created your data set, use the menu to save it as a CSV file. Click File, then

Save As. Enter a file name, and change 'Save as type:' to be Text CSV (.csv). Be sure to

save the file in your data mining data folder.

5) Open RapidMiner and import your data set into your RapidMiner repository. Name it

Chapter4Exercise, or something descriptive so that you will remember what data are

contained in the data set when you look in your repository.

6) Add the data set to a new process in RapidMiner. Ensure that the out port is connected to

a res port and run your model. Save your process with a descriptive name if you wish.

Examine your data in results perspective and ensure there are no missing, inconsistent, or

other potentially problematic data that might need to be handled as part of your Data

Preparation phase. Return to design perspective and handle any data preparation tasks that

may be necessary.

7) Add a Correlation Matrix operator to your stream and ensure that the mat port is

connected to a res port. Run your model again. Interpret your correlation coefficients as

displayed on the matrix tab.

8) Document your findings. What correlations exist? How strong are they? Are they

surprising to you and if so, why? What other attributes would you like to add? Are there

any you'd eliminate now that you've mined your data?

Search WWH ::

Custom Search

Home