Biology Reference
In-Depth Information
simplified replacing logic offsets the running time increase from the additional
steps for the dimension tuning, IPROCLUS as a whole doesn't cause any increase
in the execution time.
9.4.3. Results on the Colon Tumor Dataset
One challenge of gene expression data to clustering algorithms is the huge num-
ber of genes (dimensions) involved. Projected clustering algorithms are designed
to deal with high dimensionalities. We compared the performance of IPROCLUS
and PROCLUS on the colon tumor dataset [4]. This dataset consists of the ex-
pression values on 2000 genes of 40 tumor and 22 normal colon tissue samples.
Since each cell is either a tumor or a normal cell, we removed the outlier logic in
both PROCLUS and IPROCLUS algorithms.
Table 9.5 gives a typical result for the two algorithms. The result is obtained
when the k value is set to 2 since there are only two clusters and the l value is
set to 124 which is based on experimental analysis. For the colon tumor dataset,
IPROCLUS can correctly classify 52 out of the 62 tissues, achieving the accuracy
of 83.9%, while PROCLUS can only achieve the accuracy of 53.2% (33 correctly
classified). We can see that IPROCLUS can achieve much better accuracy on the
colon tumor set than PROCLUS.
Table 9.5.
Confusion Matrix for IPROCLUS and PROCLUS on the Colon Tumor Dataset
(a)IPROCLUS
(b)PROCLUS
Input
Input
Normal
Tumor
Normal
Tumor
Output
1
Output
1
18
6
5
17
2
4
34
2
12
28
9.5. Conclusion
Projected clustering in high dimensional space is an interesting research topic and
several algorithms have been proposed. We have introduced two existing methods
in this topic and mentioned their strengths and weaknesses. We have proposed
an effective and efficient algorithm, IPROCLUS, which is based on PROCLUS.
We have significantly improved the accuracy by proposing modified Manhattan
segmental distance. We have reduced the dependence on user input l by adding
the dimension tuning process at the end of the refinement phase and we have
proposed a simplified replacing logic in the iterative phase to offset the running
time increase caused by the dimension tuning process.
Search WWH ::




Custom Search