Biomedical Engineering Reference
In-Depth Information
Geometric classifiers are based on template matching in which the observed pattern is compared to a
geometric template that represents data in each category. The nearness of the mined data to the
template can be assessed in terms of the number of features in the observed data that match the
template. Conceptual classifiers rely on biological heuristics to define categories, and fuzzy logic
techniques can be used to assign data to a class by degree. Similarity measures may also weight
certain features more than others, according to some measure of separateness. For example, if the
distribution of data is spherical, the data mean may be used.
Statistical data-mining methods based on structural pattern recognition attempt to describe complex
patterns in terms of simpler patterns. They extract features from the data and represent the
structural features as vectors that are used with statistically determined discriminant functions. They
use a rule base to define structural features in a given class, or transform the data into a descriptive
language based on pattern primitives. The descriptions are then analyzed syntactically to provide the
classification.
Predictive modeling, which uses data within a database to predict other missing data, can be based
on continuous numerical variables (regression) or, more frequently, on categorical data
(classification). The major challenge in predictive modeling is to select the input criteria that are most
influential in defining missing data and in identifying the most appropriate transformation. With
continuous numerical variables, nonlinear transformations on the input data are often used. With
categorical data, feature extraction serves the same purpose.
Cluster analysis, also known as data segmentation, groups data into subsets that are similar to each
other. Cluster analysis is a technique that can take a large amount of data about a number of objects
and construct a simple, unique tree diagram that expresses those objects' similarities and
differences. Cluster analysis involves sorting data so that members of the same cluster are most alike
and members of different clusters are least alike. In this way, each cluster describes the class to
which its members belong.
The results of cluster analysis are commonly reported in human-readable form as a dendogram,
illustrated in Figure 7-10 . In this dendogram, groups (D) and (E) are the most alike, as indicated by
the shortest bracket. The next level of similarity is between (F) and the (D)-(E) complex. In addition,
Groups (A) and (B) are similar. Group (G) shares the least similarity with the other groups.
Figure 7-10. Dendogram Showing the Results of a Cluster Analysis. Groups
(D) and (E) show the greatest similarity, whereas Group (G) shows the
greatest differences between groups, based on cluster analysis criteria.
Search WWH ::




Custom Search