Information Technology Reference
In-Depth Information
RQ1: Is fuzzy clustering with majority ranking performing better than two
well-performed learning methods in fault prediction modeling namely
naïve bayes and random forest?
RQ2: Is fuzzy clustering with majority ranking performing better than two
well-performed learning methods in fault prediction modeling namely
naïve bayes and random forest when two-stage outlier removal is applied
on data sets?
RQ3: How our proposed model performed when two different sets of
datasets are used for training and testing process?
The remainder of this paper continues with section 2, where a brief discussion on
related works is presented. Fuzzy clustering is reviewed in section 3. Section 4
contains our proposed method. Experimental descriptions are presented in section
5. Experimental results and analysis are described in section 6, and finally, we
summarize this paper in section 7.
2
Related Works
According to Catal[5], software fault prediction became one of the noteworthy
research topics since 1990 and it includes two recent and comprehensive systematic
literature reviews [2,6]. The prediction techniques use approaches that originated
from the field of either statistics or machine learning. Some of these techniques are
genetic programming [7], decision trees [8] neural network [9], naïve bayes[10],
case-based reasoning [11], fuzzy logic [12] and the artificial immune recognition
system algorithms in [13,14,15]. As the number of related works in this area is too
much, we just presented some of them in this section.
Menzies et al. [10] conducted several experiments based on different data min-
ing algorithms with method level metrics on public NASA datasets. They evaluated
their work with probability of false alarm (PF) and probability of detection (PD),
and balance. They reported the best performer as naïve bayes and they used log-
transformation with Info-Gain filters before applying the algorithms. They claimed
that the best algorithm changes according to the dataset characteristics and numer-
ous experiments should be performed for a robust prediction model. They also
argued that since some models with low precision performed well, using it as a
reliable parameter for performance evaluation is not recommended. Although
Zhang et al. [16] criticized the paper but Menzies et al. defended their claim in [17].
Mahaweerawat et al. [18] presented a new approach for predicting software
faults by means of fuzzy clustering and radial basis function techniques. They ap-
plied the radial-basis function network after they used fuzzy subtractive clustering
to divide historical data into clusters in order to predict faults that occurred in the
component residing in each cluster. In 2007 [19], they performed a similar experi-
ment by using self-organizing map as a classifier instead of fuzzy approach. In both
Search WWH ::




Custom Search