Data Mining - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Figure 7-7. Induction-Based Classification. Using changes in entropy (a

measure of disorder) as an organizational heuristic, the induction algorithm

divides the unorganized data (top left) first by color and then by shape.

The alternative, bifurcating the circles and squares initially by shape would have resulted in a split of

10 to 20, which is less than the spread (increase in entropy) associated with a 14-to-16 split. In a

typical bioinfomatics data-mining problem, there may be 10 or more attributes to consider, according

to entropy change or some other driving heuristic.

Genetic Algorithms

Genetic algorithms are based on evolutionary principles wherein a particular function or definition

that best fits the constraints of an environment survives to the next generation, and the other

functions are eliminated. This iterative process continues indefinitely, allowing the algorithm to adapt

dynamically to the environment as needed. Genetic algorithms evaluate a large number of solutions

to a problem that are generated at random. The members of the solution population with the highest

fitness scores are allowed to "mate" with crossovers and mutations, creating the next generation.

Figure 7-8 illustrates the typical operation of a genetic operation. In this example, the possible

solutions to a problem defined by the fitness function are represented by bit strings. Each bit

represents the presence or absence of some quality that is mapped to the real-world solution. If

there is a need to represent gradations of quantities, then integers or floating-point variables could

be used instead of bit strings. However, in this example, 12 bits are used to represent the problem

matrix.

Search WWH ::

Custom Search

Home