Biomedical Engineering Reference
In-Depth Information
Over the course of the past decade, many technologies have promised to help
with the processing and analyzing of the vast amounts of information [ 1 ] we have,
and most of these technologies have come up short. We know this because as
programmers focused on data, we have tried it all. Many approaches have been
proprietary, resulting in vender lock-in. Some approaches are promising but
couldnot scale to handle large datasets and many were hyped up so much that they
couldnot meet expectations, or they simply were not ready for prime time.
When Apache Hadoop [ 2 ] entered the scene, however, everything was different.
Hadoop is an open source that had already found incredible success in massively
scalable commercial applications. Based on a MapReduce [ 3 , 4 ] algorithm that
enable us to bring processing to the data distributed on a scalable duster of
machines, we have found much success in performing complex data analysis in
ways that we havenot been able to do in past.
There was various numbers of methods of data analysis in the
eld of data
mining, pattern recognition, image processing, etc. Out of the existing methods,
K-Means is widely used. But clustering becomes more and more complex when the
process is done for large-scale datasets. The time complexity of K-Means algorithm
is O (NKD) where N is the number of objects, D number of iterations and k number
of clusters.
But the disadvantage with K-Means algorithm is k should be initialized and the
result varies with the value of k. Another disadvantage is it requires additional space
to store the data, and also for a given initial seed set of cluster centers, it generates
the same partition of the data irrespective of the order in which the patterns are
presented. Also, it doesnot necessarily
nd the most optimal [ 5 ]. It is sensitive to
the order of data input [ 6 ].
Hence, there is a need for an enhance algorithm that can minimize the above
disadvantage. Therefore, this paper introduces a novel and ef
cient technique when
the dataset is large. In this paper, we propose a new technique that includes FCM
with canopy algorithm. However, the implementation of FCM with canopy on
distributed computing yields better results.
The rest of the sections are organized as follows: Sect. 2 describes the archi-
tecture of the proposed method, Sect. 3 demonstrates about clustering process using
canopies, Sect. 4 elaborates Fuzzy C-Means algorithm for clustering the semi-
clustered groups, and the results of the proposed method have been discussed in
Sect. 5 . Finally, Sect. 6 concludes the paper.
2 Architecture of the Proposed Method
In the proposed architecture, the data available are initially fed as input to
nd the
approximate process of clustering to canopy technique and in the next step, the
points are assigned to canopy. After obtaining the initial clusters, the obtained
cluster groups are fed to FCM algorithm and the
nal clusters are obtained. The
Fig. 1 below demonstrates the different steps in the proposed architecture.
Search WWH ::




Custom Search