Biology Reference
In-Depth Information
heterogeneity within groups is high the analyst is encouraged to explore other statistical
procedures, such as logistic regression ( Jantz and Ousley, 2005 ).
Two additional considerations in DFA are outliers and multicollinearity. Discriminant func-
tion analysis is sensitive to the inclusion of outliers (individuals or measurements falling far
outside the collective distribution of all other individuals or measurements). The researcher
should carefully consider the data through graphs (plots) and descriptive statistics to identify
potential outliers. If outliers are found, the cause for each should be identified, when
possible. Remember, transcription errors (e.g., 24 entered as 42), incorrect data entry
(entering maximum cranial breadth (XCB) for maximum cranial length (GOL)), and
measurements that are just wrong (XCB measured as 145 when it is in fact 120) may lead
to outliers. When these types of errors are identified the data should be corrected. If no expla-
nation can be found, the individual may be dropped from the analysis unless there is good
reason to suspect he or she is just an expression of the variation seen in that population.
Multicollinearity is the same as trait interdependence (correlation). When two variables
are highly correlated (or one is the sum of other dependents) the parameter estimates behave
erratically when the model (or the variables) undergoes even minute changes. While this
does not affect the overall model, it does affect classifications based on that model. In other
words, collinearity also means the standardized discriminant function coefficients cannot
reliably assess the relative importance of the predictor variable(s), decreasing the overall
strength of the final discriminant function for classification purposes. As with outliers,
graphs (two-dimensional plots) of the variables will assist in identifying highly correlated
variables.
Two additional statistics that can be obtained from the discriminant function analysis
provide further information about the classification. The FORDISC 3.0 help file (Jantz and
Ousley, 2005) goes into great detail about posterior and typicality probabilities, but a brief
explanation will help the reader better understand some of the analyses described below.
Posterior probability is the probability that the unknown belongs to any one of the popula-
tions selected for in the analysis and is based on the relative distances the unknown has
(calculated using Mahalanobis distance, or D 2 ) to each population. Because it is the proba-
bility of belonging to any one of the populations used in the analysis, the posterior proba-
bility will always sum to 1. A major assumption (of classification statistics in general) is
that the unknown individual truly belongs to one of the reference groups (hence the need
for strict guidelines when selecting reference samples), because a DFA will always “force”
a classification.
We can use another statistic, typicality probability, as a measure of how likely it is that the
unknown does, in fact, belong to any one of those populations. Typicality probability is based
on the absolute distances of the unknown from all groups, rather than the relative distances.
Please note that the typicality probability is essentially equivalent to a univariate t-test. In
other words, it is a measure of how many other individuals in a population would be
expected to be as far or farther from that population's centroid than the unknown individual.
As Jantz and Ousley (2005:np) point out “[typicality probabilities] below 0.05 (5%), or
certainly 0.01 (1%) for a group
indicate questionable probability of membership in that
group or the possibility of measurement error.” This means that the typicality probability
can essentially be ignored if the value is greater than 0.05, since such values do not indicate
a statistically significant difference in the suite of measurements. When the value is less than
.
Search WWH ::




Custom Search