Databases Reference
In-Depth Information
data mining content, statements of causation may only be surmised based on
hypothetical models of the processes under study.
Given that data mining, by definition, begins without any preconceived
hypotheses, be wary of conclusions derived from the patterns uncovered.
Algorithms for Regression Analysis
A simple, well-studied, popular, and long-used algorithm for regression is linear
regression. In its simplest form, the model is defined as follows:
Y i ¼ b 0 þ b 1 X i þ e i
where:
i th observation
Y i
is the value of the output variable for the
i th observation
X i
is the value of the input variable for the
b 0 is the Y intercept
b 1 is the slope or coefficient of input variable
X
is the random error term for the i th observation.
e i
This model is usually referred to as simple linear regression, because it has
only one input. Linear regression can be extended with more input variables as:
Y i ¼ b 0 þ b 1 X 1 i þ b 2 X 2 i þ ...þ b p 1 X p 1 i þ e i
where:
p
1 is typically used to refer to
the number of input variables as there is one more implied input
1 is the number of input variables. Note:
p
X 0 whose
value is always 1. Hence, a total of p variables when X 0 is included.
With multiple inputs, the model is referred to as multiple linear regression.
As the model has been defined, the task of multiple linear regression becomes
that of generating estimates for all b coefficients using a dataset containing
values for both the output variable and all input variables. VisMiner employs a
common method for generating these estimates: ordinary least squares.A
description of the method is not included here. For the interested reader, there
are numerous topics and websites available describing the method.
Assessing Regression Model Performance
Regression model performance is defined as a measure of how well the model
predicts the output for a given input. In other words, how small is the error term?
 
Search WWH ::




Custom Search