Biology Reference
In-Depth Information
Table 2.1
Different Analytical Techniques are Suited to Different Types of Input
Data and Provide Different Types of Output Data
Technique
Predictors
Output
Discriminant Analysis
Continuous or dummies
Categorical
Linear Regression, ANOVA
Continuous or dummies
Continuous
Logistic/multinomial regression
Continuous or dummies
Categorical
Ordinal Regression
Categorical or continuous
Ordinal
Time Series Analysis
Continuous or dummies
Continuous
4.2 Linear regression
A linear regression produces a “line of best fit” through a scatterplot. The criterion used
to fit a line to a given dataset is the minimization of the sum of the squared deviation of
each point about the line (the residuals ). The output of the procedure is usually a visual
image of the data and the line of best fit ( Figure 2.3 ), a measure of the goodness of the
fit of the line, R 2 , ranging between 0.0 and 1.0, with 1.0 representing a perfect fit; and
the equation of the line. The presence of outliers in the data can reduce the R 2 value.
For a two-variable analysis the equation of a line is of the form y
¼
Ax
þ
B , and this
equation can be used to predict the value of new observations.
Multiple regression analysis is a simple extension of linear regression to incor-
porate multiple variables. Multi-dimensional data is much harder to visualise than a
simple two-dimensional scatterplot, but the application of multiple regression is
exactly the same: to produce an equation that can be used to predict the value of
new observations. This equation is of the form y
¼
B 1 * x 1 þ
B 2 * x 2 þ
B 3 * x 3 þþ
A ,
where the B s are the multipliers on the observations, x , and A is a constant.
Regression algorithms rely upon a number of assumptions: that the variables are
on an interval scale (i.e. one unit change has the same meaning throughout the range
of the data); that the residuals have a normal distribution; and that the residuals are
independent of the predicted values. Most statistical software packages will allow the
user to check whether these assumptions are met.
There are many different types of regression, including logistic, multinomial and
ordinal variables. These algorithms take different types of input data, and fit different
types of curves to the data ( Figure 2.4 ).
4.3 Discriminant analysis
Discriminant analysis is a way to build classifiers: that is, the algorithm uses labelled
trainingdata tobuild a predictivemodel of groupmembershipwhich can thenbe applied
to new cases. While regression techniques produce a real value as output, discriminant
analysis produces class labels. As with regression, discriminant analysis can be linear,
attempting to find a straight line that separates the data into categories, or it can fit any of
a varietyof curves ( Figure2.5 ). It canbe twodimensional ormultidimensional; inhigher
Search WWH ::




Custom Search