Biology Reference
In-Depth Information
ViDaExpert is a stand-alone software that is freely available online at http://
bioinfo-out.curie.fr/projects/vidaexpert/.ViDaExpert . It is a unique software tool
for visualizing multidimensional datasets that was developed by A. Zinovyev in
2001 as his Ph.D. thesis under the supervision of the mathematician, A.N. Gorban,
then at the Institute of Computational Modeling at the Siberian Branch of the
Russian Academy of Science at Krasnoyarsk, Russia. The following description
of ViDaExpert is largely based on the lecture that Zinovyev gave at Rutgers in 2006
and on the lecture slides that he generously made available to me.
ViDaExpert analyzes a finite set of objects in a multidimensional space endowed
with some way of defining the distance (metrics) among the objects. ViDaExpert
utilizes a form of the principal component analysis. One of the simplest objects that
can be embedded in data space is a line that is aligned in the direction of a maximal
dispersion of data. Such a line is referred to as the first principal component or axis.
The second principal component can be calculated as the line passing through the
middle of the first principal axis at a 90 angle, and the third principal component
can be calculated as the line going through the intersection of the first and second
principal axes at 90 to both, and so on to calculate higher principal components
(Zinovyev 2006).
The principal component can be viewed as a generalization of the concept of the
mean . The concept of the mean can be expressed in terms of a point, a set of points,
or even an object with an arbitrary topology. The mean denoted as
<
X
>
is
defined as the sum of all the values, X i , from i
1 to m, divided by m , the number
of the points or objects in the set. As a generalization of the mean value, we can
define the mean point as a point which minimizes a functional, the sum of the
squared distances between data points and the mean point. This definition is very
general. Instead of the points used in K-means clustering, we can use any object or
several objects which can be aligned in such a way as to make it the principal object
or let it minimize the sum of the squared distances from data points to the object.
After finding the principal object, we can project data points onto the surface of the
object. When data points are so projected, we are in fact making a transition
between two spaces - from data points in a high-dimensional space to a lower-
dimensional space of the principal object (Zinovyev 2006).
The principal object (also called principal manifold or principal grid ) is rather
rigid. But ViDaExpert constructs a flexible principal object. To accomplish this
goal, Zinovyev and Gorban employed the elastic net (Zinovyev 2006).
For simplicity, it is usually assumed that the stretching and bending coefficients
are equal for all edges and ribs. This leaves only two parameters to be manipulated
in constructing the principal manifold using ViDaExpert. The first parameter
restricts the total length, or area, or the volume of the principal manifold. The
second parameter tends to smooth out the topology of the manifold. One important
point is that the energy functionals are all quadratic which means that they can be
optimized in one step, solving a system of linear equations. And this makes
ViDaExpert fast, in fact, one of the fastest methods now available to construct
optimal principal manifolds (Zinovyev 2006).
ΒΌ
Search WWH ::




Custom Search