Database Reference
In-Depth Information
them back to their source records which might contain the new clients' addresses, enabling her to
break the predictions down by city, county, or region of the country. Sarah could then work with
her colleagues in Operations and Order Fulfillment to ensure that regional heating oil distribution
centers around the country have appropriate amounts of stock on hand to meet anticipated need.
If Sarah wanted to get even more granular in her analysis of these data, she could break her
training and scoring datas set down into months using a month attribute, and then run the
predictions again to reveal fluctuations in usuage throughout the course of the year.
CHAPTER SUMMARY
Linear regression is a predictive model that uses training and scoring data sets to generate numeric
predictions in data. It is important to remember that linear regression uses numeric data types for
all of its attributes. It uses the algebraic formula for calculating the slope of a line to determine
where an observation would fall along an imaginary line through the scoring data. Each attribute
in the data set is evaluated statistically for its ability to predict the target attribute. Attributes that
are not strong predictors are removed from the model. Those attributes that are good predictors
are assigned coefficients which give them weight in the prediction formula. Any observations
whose attribute values fall in the range of corresponding training attribute values can be plugged
into the formula in order to predict the target.
Once linear regression predictions are calculated, the resuts can be summarized in order to
determine if there are differences in the predictions in subsets of the scoring data. As more data
are collected, they can be added into the training data set in order to create a more robust training
data set, or to expand the ranges of some attributes to include even more values. It is very
important to remember that the ranges for the scoring attributes must fall within the ranges for the
training attributes in order to ensure valid predictions.
REVIEW QUESTIONS
1) What data type does linear regression expect for all attributes? What data type will the
predicted attribute be when it is calculated?
2) Why are the attribute ranges so important when doing linear regression data mining?
 
 
Search WWH ::




Custom Search