Information Technology Reference
In-Depth Information
is reduced to small subgroups and research on each subgroup will be easier and
more direct. Clustering has been widely studied in past 20 years, and a general re-
view of clustering is by Jain et al. in [31] while a survey of clustering algorithms is
also available by Xu et al. in [57]. The future challenges in biological networks are
available in the topic edited by Chaovalitwongse et al. in [9].
However, clustering only does the work of objects without considering the fea-
tures of each object may have. In other words, clustering compares two objects by
the features that two share, without depicting the different features of the two. A
method simultaneously groups the objects and features is called biclustering such
that a specific group of objects has a special kind group of features. More precisely,
a biclustering is to find a subset of objects and features satisfying these objects are
related to features to some level. Such kind of subsets are called biclusters. Mean-
time, biclustering does not require objects in the same bicluster to behave similarly
over all possible features, but to highly have specific features in this bicluster.
Besides the differences from clustering mentioned above, biclustering also has
the abilities to find the hide features and specify them to some subsets of objects.
We should also realize that biclustering also has relations but differences from other
techniques, such as classification, feature selection, and outlier detection in data
mining. Classification is a kind of supervised clustering while most algorithms used
in biclustering are unsupervised, and for some supervised biclustering see [4, 40].
The biclustering problem is to find biclusters in data sets, and it may have differ-
ent names such as co-clustering, two-mode clustering in some literatures.
6.1.2 Data Input
Usually, we call the objects as samples. Samples have different features and each
sample may have or may not have some features. The level of a sample having
some specific feature is called expression level. In real world, the samples may have
quantitative features or qualitative features. The expression levels of quantitative
features can be easily expressed in numerical data, while qualitative features have
to use some scale measurement to be transformed into data. For some algorithms of
biclustering, qualitative features are allowed.
Mainly, the biclustering algorithms are starting with matrices. There are two
kinds of them usually used, and the first is more possible to be used in bicluster-
ing.
Expression Matrix. This data matrix has rows corresponding to samples, columns
to features, with entry measuring the expression level of a feature in a sample.
Each row is called a feature vector of the sample. We can also call this matrix as
sample-by-feature matrix.
Sometimes, the matrix is formed from all samples' feature vectors, and the fea-
tures' level in this sample will be observed directly. Generally we just scale and
then put these vectors together to form a matrix if all vectors have the same
length, which means they have the same set of features. However, the feature
Search WWH ::




Custom Search