Solving NP-Complete Problems by Harmony Search - Music-Inspired Harmony Search Algorithm: Theory and Applications - page 65

Information Technology Reference

In-Depth Information

document clustering. Particle swarm optimization (PSO) [44] is another computational

intelligence method that has been applied to image clustering and other low dimen-

sional datasets in [39, 45, 46] and to document clustering in [42]. HS is employed for

document clustering in [47, 48].

To compare the quality and the speed of different clustering algorithms, some

known data sets are available and have been used. In all of datasets, before applying

clustering algorithm, the very common words (stop words) are stripped out com-

pletely and different forms of a word are reduced to one canonical form by using Por-

ter's algorithm and then converted to the vector space model.

To demonstrate the document clustering accuracy in comparison to the best con-

temporary methods, five data sets are selected from different known sources. Data

sets DS1 and DS2 are from TREC-5, TREC-6, and TREC-7 [49]; the data set DS3

was derived from the San Jose Mercury newspaper articles that are distributed as part

of the TREC collection (TIPSTER); the data set DS4 is selected from the DMOZ col-

lection; and the DS5 dataset is a collection of 10,000 messages, collected from 10 dif-

ferent Usenet newsgroups (1,000 messages from each). After preprocessing, there are

a total of 9249 documents in this data set.

Figure 8 compares five different algorithms on the selected datasets. These algo-

rithms includes HS clustering [47], K- means (best known partitioning algorithm), ge-

netic K -means (GA) [50], particle swarm optimization based clustering (PSO) [42]

and a Mises-Fisher generative model based algorithm (GM) [51, 52]. Figure 8 shows

the results of applying these algorithms on five datasets considering the normalized

ADDC of algorithm. From the results, it is easy to know that the HS method outper-

forms GA, K -means, and PSO in all datasets, while the GM algorithm generates

higher quality clusters than the HS based algorithm for the dataset DS2.

K-means

Harmony GA

PSO GM

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

DS1

DS2

DS3

DS4

DS5

Datas e t

Fig. 8. Quality of clustering generated by various algorithms

Next Page

Music-Inspired Harmony Search Algorithm: Theory and Applications

Search WWH ::

Custom Search

Home