Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

making no assumptions about the densities, and using the same formulas

in all cases.

(b) In each EM iteration the density functions must be evaluated, requiring

KN function evaluations (in Step 1) where K is the number of densities

in the mixture. In comparison, the PDQ iterations are cheaper, requiring

no function evaluations.

(c) Because the EM iterations are costly, it is common to use another

method, e.g., the K -means method, as a pre-processor, to get closer

to the centers before starting EM. The PDQ Algorithm needs no such

switch, and works well from a cold start.

(d) If correct assumptions are made regarding the mixing distributions, then

the EM method has an advantage over the PDQ method, as illustrated in

Example 2.6 below.

(e) While the numerical comparison between the two algorithms should best

be left to others, our preliminary tests show the two algorithms to be

roughly equivalent in terms of the returned results, with the PDQ Algo-

rithm somewhat faster.

2.5. Numerical Experiments

In Examples 2.4-2.6 below the PDQ and EM Algorithms were applied to the

same data, in order to compare their performance.

The results are reported in

Tables 2.1-2.4.

These examples are typical representatives of many numerical

tests we did.

Both programs used here were written in MATLAB, the EM code by Tsui [16],

and the PDQ code by the first author.

The comparison is subject to the following limitations:

(a) The EM program code uses the K -means method (Hartigan [5]) as a

preprocessor to get a good start. The number of iterations and running

time reported for this program in Table 2.4 is just for the EM part, not

including the time for the K -means part.

(b) Our PDQ code is the first un-optimized, un-finessed version, a verbatim

implementation of Algorithm 1.

(c) The number of iterations depends on the stopping rule. In the PDQ Al-

gorithm, the stopping rule is Step 4 of Algorithm 1, and the number of

iterations will increase the smaller is . In the EM Algorithm the stop-

ping rule does involve also the convergence of the likelihood function,

and the effect of the tolerance is less pronounced.

Search WWH ::

Custom Search

Home