Databases Reference
In-Depth Information
As with classification, to construct a regression model, a dataset containing
data from past events is used. That dataset needs to contain values for both the
input and output attributes. Once constructed and validated, the model can be
used in the future to make predictions when the input attribute values are known,
but the value of the output attribute does not yet exist or is unknown. For
example, a manufacturer may want to predict sales based on planned advertis-
ing, pricing, and other inputs; an insurance underwriter may want to predict
future expected loss amounts; a bookie may want to predict the point spread of a
sporting event; or an economist may attempt to predict economic growth.
As in any data mining application, another potential use of regression
analysis is simply to gain a better understanding of relationships between
input attributes and the targeted output attribute. For example, a marketing
research analyst may wish to better understand, and potentially quantify, the
relationship between advertising and sales. What is the contribution of each new
advertising dollar to sales; and is the contribution constant or are there
decreasing returns beyond a certain level? Is there a threshold under which
the level of advertising does very little or no good? A home building company,
in evaluating potential new home designs, may want to know if the addition of a
formal dining room would increase the value enough to cover the added cost.
A more difficult assessment when evaluating relationships is in the identifi-
cation of interactions between inputs. For example, the response to advertising
may be different between potential female and male customers; or the benefits
of a dining room are probably different with respect to a 1000 square foot house
versus an 8000 square foot house.
Correlation and Causation
As a reminder before proceeding further, don't forget that just because a
significant relationship is found between a pair of input and output attributes,
it does not mean that the input attribute causes the output. It may be that the
opposite is true - a change in the output variable causes a corresponding change
in the input variable; hence the relationship exists. For example, if a positive
relationship is observed between diet Coke consumption and obesity, does that
mean diet Coke consumption leads to obesity? Or might it be that being obese
leads one to consume more diet Coke?
In other situations it may be that a third unmeasured attribute causes both. For
example, when a relationship exists between shoe size and basketball playing
ability, does that imply that wearing bigger shoes will improve one's playing
ability or that increased practice on the basketball court will force you into
bigger shoes? The answer to both is obviously “no”.
The only theoretically sound source from which definitive statements about
cause and effect is data collected in a well-designed and tightly controlled
experimental setting. When working with field data, the source of almost all
 
Search WWH ::




Custom Search