Database Reference
In-Depth Information
Sarah, the regional sales manager from the Chapter 4 example, is back for more help. Business is
booming, her sales team is signing up thousands of new clients, and she wants to be sure the
company will be able to meet this new level of demand. She was so pleased with our assistance in
finding correlations in her data, she now is hoping we can help her do some prediction as well.
She knows that there is some correlation between the attributes in her data set (things like
temperature, insulation, and occupant ages), and she's now wondering if she can use the data set
from Chapter 4 to predict heating oil usage for new customers. You see, these new customers
haven't begun consuming heating oil yet, there are a lot of them (42,650 to be exact), and she
wants to know how much oil she needs to expect to keep in stock in order to meet these new
customers' demand. Can she use data mining to examine household attributes and known past
consumption quantities to anticipate and meet her new customers' needs?
After completing the reading and exercises in this chapter, you should be able to:
Explain what linear regression is, how it is used and the benefits of using it.
Recognize the necessary format for data in order to perform predictive linear regression.
Explain the basic algebraic formula for calculating linear regression.
Develop a linear regression data mining model in RapidMiner using a training data set.
Interpret the model's coefficients and apply them to a scoring data set in order to deploy
the model.
Search WWH ::

Custom Search