Biology Reference
In-Depth Information
1.1. Introduction
The central idea behind graph-modeled data clustering is to depict the similarity
between a set of entities as a graph: Each vertex represents an entity—such as a
gene or protein—and two vertices are connected by an edge if the entities that they
represent have some (context-specific) similarity; for instance, two genes have a
similar expression profile or two proteins have a high sequence similarity. Groups
of highly connected vertices in the resulting graph represent clusters of mutually
similar entities. Hence, detecting these dense groups can identify clusters in the
graph-encoded data.
Graph-modeled data clustering has been shown to have useful applications in
many areas of bioinformatics, including the analysis of gene expression [8, 16,
65, 66], proteins [45, 46], gene networks [68], allergic reactions [9], and marine
ecosystems [54]. There is a catch, however: Most problems that are concerned
with the detection of cluster structures in a graph are known to be NP-hard, that
is, there is probably no algorithm that can solve all instances efficiently [32]. Thus,
whenever such a problem is encountered and large instances need to be solved, it is
common to employ heuristic algorithms [58], approximation algorithms [5, 67], or
similar techniques. These usually come with some disadvantages: The solutions
are not guaranteed to be optimal or there are no useful guarantees concerning the
running time of the algorithm. Further, approximation algorithms and—to some
extent—heuristic algorithms are not suited to cope with enumerative tasks. There
are many scenarios where these disadvantages seem too severe, that is, where
we need to solve a combinatorially hard problem both optimally and yet at the
same time somewhat efficiently. For some combinatorial problems, this can be
achieved by means of fixed-parameter algorithms [25, 30, 60]. These are based
on the observation that not all instances of an NP-hard problem are equally hard to
solve; rather, this hardness depends on the particular structure of a given instance.
Opposed to “classical” computational complexity theory—which sees problem in-
stances only in terms of their size—fixed-parameter algorithms and the underlying
theory of fixed-parameter tractability (FPT) reflect such differences in structural
hardness by expressing them through a so-called parameter , which is usually a
nonnegative integer variable denoted k .
Whenever the parameter k turns out to be small, fixed-parameter algorithms
may solve an NP-hard problem quite fast (sometimes even in linear time)—with
provable bounds on the running time and guaranteeing the optimality of the so-
lution that is obtained. More precisely, a size- n instance of a fixed-parameter
tractable problem can be solved in f ( k )
p ( n ) time, where f is a function solely
depending on k ,and p ( n ) is a polynomial in n .
·
Search WWH ::




Custom Search