Information Technology Reference
In-Depth Information
Table 1. Accuracy of Iris dataset
K-means
(Canopy initial)
MRAP
AMRAP
Precision Recall Precision Recall Precision Recall
Cluster1
0.690
0.980
0.723
0.940
0.980
1
Cluster2
1
1
1
1
0.742
0.980
Cluster3
0.97
0.56
0.914
0.64
1
0.667
Average
0.887
0.847
0.877
0.860
0.907
0.882
Table 2. Accuracy of SatImage dataset
K-means
(Canopy initial)
MRAP
AMRAP
Precision Recall Precision Recall Precision Recall
Cluster1 0.75
0.1472 0.0615
1
0.441
0.533
Cluster2 0.4665
0.1584 0.0745
0.1123 0.371
0.485
Cluster3 0.4474
0.2253 0
0
0.588
0.521
Cluster4 0.1823
0.3643 0
0
0.352
0.56
Cluster5 0.0543
0.4629 0
0
0.493
0.68
Average
0.466
0.267
0.026
0.22
0.449
0.556
4
Conclusions and Future Work
We propose the Adaptive Map/Reduce Affinity Propagation (AMRAP) method im-
plemented on Hadoop. The main differences between the proposed AMRAP with
Map/Reduce Affinity Propagation (MRAP) is that the proposed AMRAP can decide
suitable preference values automatically. The proposed AMRAP also inherits the
multi-processing advantage that is scalable with added machines. And on this archi-
tecture, the proposed AMRAP method can process large dataset with good perfor-
mance unlike the one node system.
But some problems persist in our experiments. If the reducer task takes longer than
600 seconds the job will be killed because of time-out. Currently we try to increase
the max time-out into 1800 seconds, but the problem still persists. We will next
employ the SIFT dataset which is composed from images. Each image has different
number of SIFT features with 128 dimensions. And the SIFT dataset is expected to
have million to billion number of SIFT features. We will employ the proposed
AMRAP to analyze the large SIFT dataset and solve the time-out problem in the
future.
Search WWH ::




Custom Search