Information Technology Reference
In-Depth Information
vectors may not conform each other. In this case, we should add values (may
be 0) to vectors with no corresponding features in order to form same-length
vectors. In some applications, there are always large set of samples with limited
features.
Similarity Matrix. This data matrix has both rows and columns corresponding to
a set of samples, with each entry measuring the similarity between two corre-
sponding samples. It has same number of rows and columns, and it is symmetric.
This matrix can be called sample-by-sample matrix.
Note: this matrix can also be used as dissimilarity matrix with entry denoting
the dissimilarity between a pair of samples. There are many similarity measure-
ment functions to compute the (dis)similarity entries, such as Euclidean distance,
Mahalanobis distance. So the similarity matrix can be computed from the expres-
sion matrix.
Since the developments of biclustering are including some time series models
[38, 52], another kind of time series data is also used in biclustering. This data also
can be viewed as stored in a matrix with that rows denote samples, while columns
from left to right denote observed time points.
For some qualitative features in some cases, the data matrix is a kind of sign
matrix. Some biclustering algorithms are still used.
Sometimes, before processing algorithms on the matrix, some steps are used,
such as normalization, discretization, value mapping, and aggression, and the details
of these data preparation operations are available at [16].
In the following, the data matrix usually refers to the first kind of expression
matrix without explanation.
6.1.3 Objective of Task
Obviously, the objective of biclustering is to find biclusters in data. In clustering,
the obtained clusters should have the propositions that the similarities among the
samples within each cluster are maximized and the similarities between samples
from different clusters are minimized.
For biclustering, the samples and features in each bicluster are highly related.
But this does not mean the samples in this bicluster do not have other features,
they just have the features in this bicluster more obvious and they still share other
features. Thus, in each bicluster, the relations between the samples and the features
are closer rather than relations between samples (features) from this bicluster and
features (samples) from another bicluster.
Some biclustering algorithms allow that one sample or feature can belong to sev-
eral biclusters (called overlapping) while some others produce exclusive biclusters.
In addition, some algorithms have the property that each sample or feature must
have its corresponding bicluster, while some others need not to be exhaustive and
can allow only find one submatrix or several ones from data matrix to form the
biclusters.
Search WWH ::




Custom Search