Information Technology Reference
In-Depth Information
where Z i and Z j are z-scores from the marginal distributions, and f(Z i ; Z j ) is the
joint likelihood measure.
Pathway Analysis: Genes encode proteins that in most cases take part to well
dened metabolic, regulatory or signalling pathways. The correct functioning of a
pathway is guaranteed by the overlap of the expression proles of all genes involved
in that process. In other words, when the pathway is required, all the proteins
must be present. It follows that nding groups of coexpressed genes facilitates the
identication of genes belonging to the same cellular process i.e. the same pathway.
In the network framework, this corresponds to the identication of densely inter-
connected regions in the coexpression network. There are many dierent algorithms
to cluster high dimensional data, and most of them are implemented in most com-
mon commercial and freeware mathematical/statistical software. Some of the most
known algorithms are K-means, fuzzy c-means and quality threshold (QT) cluster-
ing [33]. The QT is an alternative method of partitioning data, invented for gene
clustering, which does not require specifying the number of clusters. One recent al-
gorithm having the same good property is the Markov Clustering (MCL) [34] that
has been used with success to cluster networks based on both sequence homologies,
e.g. [35{37] or, as in this case, expression proles [38, 39].
MCL Clustering: A natural property of clusters is that most edges are intra-
cluster and only a few are inter-cluster. This implies that random walks on the
graph will rarely go from one cluster to another. This feature is exploited by the
MCL algorithm, that computes the probabilities of random walks through the graph
performing iterations of two operators, expansion and ination, on a stochastic
matrix. Expansion of a stochastic matrix corresponds to computing long random
walks. It associates new probabilities with all pairs of connected nodes, where one
node is the point of departure and the other is the destination. Since higher length
paths are more common within than between clusters, intra-cluster probabilities
will be often relatively large. Ination will then have the eect of boosting the
probabilities of intra-cluster walks and demoting inter-cluster walks. After several
iterations, each one followed by normalization of the matrix to keep it stochastic,
the graph will be eventually separated into dierent clusters. The only parameter
of the MCL is the granularity which has the eect of changing the strength of the
ination step, increasing the tightness of clusters. Additionally, also the parameter
controlling expansion can be changed.
2.4. Protein-Protein Interaction Networks
Protein interactions are crucial for all levels of cellular function, including archi-
tecture, regulation, metabolism, and signaling. Therefore, protein interaction maps
represent essential components of post-genomic toolkits needed for understanding
Search WWH ::




Custom Search