Inferring the Topology of Gene Regulatory Networks: An Algebraic Approach to Reverse Engineering - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

distance between two vector entries v and w will be the one-dimensional Euclidean

distance

. Top-down clustering algorithms start from the entire data set and

iteratively split it until either the degree of similarity reaches a certain threshold or

every group consists of one object only. For the purpose of data analysis, it is imprac-

tical to let the clustering algorithm produce clusters containing only one real value.

The iteration at which the algorithm is terminated is crucial since it determines the

degree of the discretization. SLC with the Euclidean distance function has one major

advantage: very little starting information is needed (only distances between points).

In addition, being a hierarchical clustering procedure it lends itself to adjustment in

case that clusters need to be split or merged. It may result, however, in a discretiza-

tion where most of the points are clustered into a single partition if they happen to be

relatively close to one another. Another problemwith SLC is that its direct implemen-

tation takes D , the desired number of discrete states, as an input. However, we would

like to choose D as small as possible, without losing information about the system

dynamics and the correlation between the variables, so that an essentially arbitrary

choice is unsatisfactory.

In [ 44 ], a hybrid method for discretization of short time series of experimental data

into a finite number of states was introduced. It is a modification of the SLC algorithm:

it begins by discretizing a vector in the same way as SLC does but instead of providing

D as part of the input, the algorithm contains termination criteria which determine

the appropriate value of D . After that each discrete state is checked for information

content and if it is determined that this content can be considerably increased by

further discretization, then the state is separated into two states in a way that may not

be consistent with SLC.

If more than one vector is to be discretized, the algorithm discretizes each vector

independently. (For details on multiple vector discretization, see [ 44 ]). If the vector

contains m distinct entries, a complete weighted graph on m vertices is constructed,

where a vertex represents an entry and an edge weight is the Euclidean distance

between its endpoints. The discretization process starts by deleting the edge(s) of

highest weight until the graph gets disconnected. If there is more than one edge

labeled with the current highest weight, then all of the edges with this weight are

deleted. The order in which the edges are removed leads to components, in which

the distance between any two vertices is smaller than the distance between any two

components, a requirement of SLC. We define the distance between two components

G and H to be dist

| v − w |

.The output of the algorithm

is a discretization of the vector, in which each cluster corresponds to a discrete state

and the vector entries that belong to one component are discretized into the same

state.

Example 3.12.

(

) =

min

−

∈

}

is to be discretized.

The diagram obtained by the SLC algorithm is given in Figure 3.3 . The complete

weighted graph in Figure 3.4 corresponds to iteration 0 of the diagram.

Having disconnected the graph, the next task is to determine if the obtained degree

of discretization is sufficient; if not, the components need to be further disconnected in

Suppose that vector v

= (

)

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home