Data Mining - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

The challenge of using a neural network to recognize and categorize data, especially novel data that

haven't been presented to the system before, is that of validating the results and of communicating

the rationale behind the results to the user. The greatest drawback of neural networks is that it's

practically impossible to assess the significance of what's happening inside of a complex network.

Even though the "wiring" of the nodes may be known, the relevance of changes in the strength of the

connections is difficult to assess, even when the strengths are known. As a result, the inner workings

of a neural network are difficult to validate.

Because a pure neural network presents such a formidable validation challenge, many neural network

data-mining systems are used in conjunction with rule-based expert systems that contain human-

readable rules in the form:

IF condition THEN outcome

These hybrid systems can categorize novel patterns and provide researchers with insight into the

operation of the biological system. The challenge in creating hybrid classification systems is

integrating the neural networks and rule-based expert systems in a way that doesn't compromise

classification performance while providing enough information on internal operation to allow the user

to assess the validity of the classification results. One approach to maximizing performance is to

develop a neural network and then use a tool that converts the network into a rule base that can be

compiled in C++ or Assembly language.

Statistical Methods

The statistical methods used to support data mining are generally some form of feature extraction,

classification, or clustering. Statistical feature extraction is concerned with recovering the defining

data attributes that may be obscured by imperfect measurement, improper data processing, or noise

in the data.

A variety of statistical pattern-classification methods may be applied to data mining. For example,

probabilistic classifiers are based on the principle that a pattern should be assigned to the class that

is most probable. Bayesian techniques that estimate the joint probability of distributions can also be

used to assess this probability. Although this method of classification generally provides excellent

results, it has a major drawback of requiring more complete data than other methods.

Search WWH ::

Custom Search

Home