Linear Regression - Data Mining for the Masses

Database Reference

In-Depth Information

3) Import both of your data sets into your RapidMiner repository. Be sure to give them

descriptive names. Drag and drop them into a new process, and rename them as Training

and Scoring so that you can tell them apart.

4) Use a Set Role operator to designate the Salary attribute as the label for the training data.

5) Add a linear regression operator and apply your model to your scoring data set.

6) Run your model. In results perspective, examine your attribute coefficients and the

predictions for the athletes' salaries in your scoring data set.

7) Report your results:

a. Which attributes have the greatest weight?

b. Were any attributes dropped from the data set as non-predictors? If so, which ones

and why do you think they weren't effective predictors?

c. Look up a few of the salaries for some of your scoring data athletes and compare

their actual salary to the predicted salary. Is it very close? Why or why not, do you

think?

d. What other attributes do you think would help your model better predict

professional athletes' salaries?

Search WWH ::

Custom Search

Home