Agriculture Reference
In-Depth Information
8.5.3 Cluster analysis
Cluster analysis aims to classify individual units or subjects into clusters of similar
units on the basis of a matrix of similarities (or distance) between all pairs of
subjects to be clustered (Everitt et al. , 2001). The similarity (distance) matrix is
calculated from the observed or measured disease-related variables. A hierarchical
clustering approach is often used where an object is permitted to be a member of
only one cluster.
There are two key decisions to be made when applying cluster analysis. First,
which matrix (similarity or dissimilarity) should be used for clustering and
which method is to be used for calculating this matrix? The second relates to the
choice of clustering methods. For example, if two objects are already considered
to form a cluster, what is the criterion for deciding whether a third object should
join the same cluster? Should the criterion be based on its closeness at least
to one member, or closeness to all members, or to the centre of this existing
cluster? Different clustering methods may lead to different cluster structures.
For instance, nearest-neighbour clustering tends to keep adding new members to
existing clusters whereas furthest-neighbour clustering tends to produce more
small clusters. A useful strategy is to apply several methods to investigate the
consistency of the resulting clusters and possible biological implications of the
formed clusters.
The resulting cluster structure is generally presented as a dendrogram. However,
the clusters formed do not necessarily have any practical or biological significance.
Forming a dendrogram is a simple matter given the accessibility of powerful general
statistical software but interpreting the cluster structures is a more challenging task.
Further discriminant and canonical variate analysis using the resulting cluster groups
might assist in the interpretation. We must be aware of the danger that we invent a
rational explanation for a particular clustering and then to argue that the cluster
analysis supports this explanation. This is particularly true when the final clustering
varies considerably with the clustering method and strategy used. Rather we should
argue that the cluster analysis supports our prior expectation of cluster formation,
which comes from independent sources or reasoning.
Kranz (1968) first used the cluster analysis in plant epidemiological research.
In a cluster analysis of 40 pathosystems using 13 variables describing epidemic
patterns, twelve groups were identified. This research indicated the huge
variability that may be observed among epidemics even within the same
pathosystem. Cluster analysis led to the identification of two types of epidemics
of bean hypocotyl rot based on the analysis of six variables representing the
characteristics of observed disease patterns (Campbell et al. , 1980a). Results
from cluster analysis of experimental data indicated that epidemic development
was related mainly to the date of sowing for Fusarium wilt of chickpea (Navas-
Cortes et al. , 1998). Similarly, cluster grouping of observed epidemics of papaya
ringspot was primarily related to site and transplanting date (Mora-Aguilera
et al. , 1996).
Search WWH ::




Custom Search