A Novel Clustering Approach Using Hadoop Distributed Environment - Computational Intelligence Techniques for Comparative Genomics

Biomedical Engineering Reference

In-Depth Information

Table 1 Comparison of classi ers

Classi er

# Correct classi ed samples

Correct classi cation rate (%)

K-Means with canopy

172

79.2

Fuzzy C-Means with canopy

186

84.8

We have made an attempt to show the difference in execution times on single node

and multi nodes along with the comparison between Hard Clustering

(K-Means) and Soft Clustering (Fuzzy C-Means) techniques. The experimentation

is carried out using different size data points of 1,000, 100,000, 1,000,000, and

10,000,000 records. The comparison is also made to demonstrate the number of

classi

cation with respect to both

K-Means and Fuzzy C-Means algorithms with canopy in Table 1 . With the results in

Table 1 , it is clear that the FCM with canopy is effective than K-Means with canopy.

The experimentation is done on Ubuntu 12.10, Hadoop 0.20.1 and Mahout 6 envi-

ronment using Java7.

In Fig. 2 , the experimentation is done using different size datasets and the time to

complete the clustering process is analyzed. The graph shows the comparison

between K-Means with canopy and Fuzzy C-Means with canopy techniques. It is

observed that the time taken to cluster the data is almost equal for smaller datasets.

But as the size of the dataset increases, decrease in the time taken is reduced for

Fuzzy C-Means technique than the K-Means techniques.

Hence, from the Fig. 3 and Table 1 , it is evident that the proposed method Fuzzy

C-Means with canopy technique is more ef

ed samples and the correctness of the classi

cient than the K-Means with canopy

technique.

Fig. 3 Graph showing the

number of documents versus

time required to process

Search WWH ::

Custom Search

Home