Solving NP-Complete Problems by Harmony Search - Music-Inspired Harmony Search Algorithm: Theory and Applications

Information Technology Reference

In-Depth Information

Fig. 7. Execution time of different algorithms on randomly generated networks

in comparison with other algorithms is tremendous and HSNPI outperforms GA when

the cost of generated multicast trees is considered.

7 Document Clustering

Fast and high quality document clustering has become an increasingly important

technique for enhancing search engine results, web crawling, unsupervised document

organization, and information retrieval or filtering. Clustering involves dividing a set

of documents into a specified number of groups. The documents within each group

should exhibit a large degree of similarity and the similarity among different clusters

should be minimized. Some of the more familiar clustering methods are: partitioning

algorithms based on dividing entire data into dissimilar groups, hierarchical methods,

density and grid based clustering, and some graph based methods [34, 35].

In most document clustering algorithms, documents are represented using a vector-

space model. In this model, each document d is considered to be a vector

{

G

}

in term-space (set of document 'words') where d is the weight of

dimension i in vector space and t is the number of term dimensions. The most

widely used weighting approach for term weights is the combination of Term Fre-

quency and Inverse Document Frequency (TF-IDF) [36-38].

The similarity between two documents must be measured in some way if a cluster-

ing algorithm is to be used. The vector space model gives us a good opportunity for

defining different metrics for similarity between two documents. The most common

similarity metrics are Minkowski distances [39] and the cosine measure [36, 38, 40].

d

=

d

,

d

,

d

…

1

2

t

Search WWH ::

Custom Search

Home