Recent Advances of Data Biclustering with Application in Computational Neuroscience - Computational Neuroscience

Information Technology Reference

In-Depth Information

vectors may not conform each other. In this case, we should add values (may

be 0) to vectors with no corresponding features in order to form same-length

vectors. In some applications, there are always large set of samples with limited

features.

•

Similarity Matrix. This data matrix has both rows and columns corresponding to

a set of samples, with each entry measuring the similarity between two corre-

sponding samples. It has same number of rows and columns, and it is symmetric.

This matrix can be called sample-by-sample matrix.

Note: this matrix can also be used as dissimilarity matrix with entry denoting

the dissimilarity between a pair of samples. There are many similarity measure-

ment functions to compute the (dis)similarity entries, such as Euclidean distance,

Mahalanobis distance. So the similarity matrix can be computed from the expres-

sion matrix.

Since the developments of biclustering are including some time series models

[38, 52], another kind of time series data is also used in biclustering. This data also

can be viewed as stored in a matrix with that rows denote samples, while columns

from left to right denote observed time points.

For some qualitative features in some cases, the data matrix is a kind of sign

matrix. Some biclustering algorithms are still used.

Sometimes, before processing algorithms on the matrix, some steps are used,

such as normalization, discretization, value mapping, and aggression, and the details

of these data preparation operations are available at [16].

In the following, the data matrix usually refers to the first kind of expression

matrix without explanation.

6.1.3 Objective of Task

Obviously, the objective of biclustering is to find biclusters in data. In clustering,

the obtained clusters should have the propositions that the similarities among the

samples within each cluster are maximized and the similarities between samples

from different clusters are minimized.

For biclustering, the samples and features in each bicluster are highly related.

But this does not mean the samples in this bicluster do not have other features,

they just have the features in this bicluster more obvious and they still share other

features. Thus, in each bicluster, the relations between the samples and the features

are closer rather than relations between samples (features) from this bicluster and

features (samples) from another bicluster.

Some biclustering algorithms allow that one sample or feature can belong to sev-

eral biclusters (called overlapping) while some others produce exclusive biclusters.

In addition, some algorithms have the property that each sample or feature must

have its corresponding bicluster, while some others need not to be exhaustive and

can allow only find one submatrix or several ones from data matrix to form the

biclusters.

Search WWH ::

Custom Search

Home