Biomedical Engineering Reference
In-Depth Information
The tests can be binary (yes/no) as in Test 2, or multi-variant (high, medium, low) as in Test 1. For
example, in operation, a decision tree can be used to categorize a protein based on a combination of
molecular weight, length, and configuration. As illustrated in the figure, the terminal or leaf nodes
needn't result in mutually exclusive categorization of the input data. Both Test 2 and Test 7 classify
the input into category (A), for example.
A potential limitation of using decision trees is related to their inability to represent relative
occurrence frequencies. For example, with a very small training set, it's likely that the terminal leaves
of a complex tree are defined by chance alone. Consider the typical evolutionary tree that represents
the speciation over the past several hundred-million years. A single fossil may be responsible for a
bifurcation in the tree, even though the fossil may represent a relatively small, insignificant mutation
in a much larger population. However, in the tree representation, the populations have equal
weights.
In some cases, this inability to represent the relative frequency of occurrence can be used to
advantage. For example, in classifying globins from a variety of species, multiple samples from the
same or closely related species may skew the relative abundance of some properties over others.
However, if these properties are represented as a decision tree, then the skew due to sample
anomalies can be avoided.
Hidden Markov Models
A powerful statistical approach to constructing classifiers that deserves a separate discussion is the
use of Hidden Markov Modeling. A Hidden Markov Model (HMM) is a statistical model for an ordered
Search WWH ::




Custom Search