Biomedical Engineering Reference
In-Depth Information
Pattern Recognition and Discovery
Data mining is the process of identifying patterns and relationships in data that often are not obvious
in large, complex data sets. As such, data mining involves pattern recognition and, by extension,
pattern discovery. In bioinformatics, pattern recognition is most often concerned with the automatic
classification of character sequences representative of the nucleotide bases or molecular structures,
and of 3D protein structures.
As illustrated in Figure 7-5 , the pattern-recognition process starts with an unknown pattern, such as
a potential protein structure, and ends with a label for the pattern. From an information-processing
perspective, pattern recognition can be viewed as a data simplification process that filters extraneous
data from consideration and labels the remaining data according to a classification scheme.
Figure 7-5. The Pattern-Recognition and Discovery Process. Pattern
discovery differs from pattern recognition in that feature selection is
determined empirically under program control.
The major steps in the pattern recognition and discovery process are:
Feature Selection. Given a pattern, the first step in pattern recognition is to select a set of
features or attributes from the universe of available features that will be used to classify the
pattern. When pattern recognition is directed at known patterns, the researcher defines a
priori the features that will be used to distinguish the pattern from other data. Feature
selection often takes the form of exemplars or representative examples of the features that
will be measured, such as the tertiary geometry of a protein. In pattern discovery, which is
more complex than simple pattern recognition, feature selection is under program control.
Instead of an a priori definition of pattern attributes defining a class or group of data that are
l
 
 
Search WWH ::




Custom Search