Increasing the Accuracy of Software Fault Prediction Using Majority Ranking Fuzzy Clustering* - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Information Technology Reference

In-Depth Information

•

RQ1: Is fuzzy clustering with majority ranking performing better than two

well-performed learning methods in fault prediction modeling namely

naïve bayes and random forest?

•

RQ2: Is fuzzy clustering with majority ranking performing better than two

well-performed learning methods in fault prediction modeling namely

naïve bayes and random forest when two-stage outlier removal is applied

on data sets?

•

RQ3: How our proposed model performed when two different sets of

datasets are used for training and testing process?

The remainder of this paper continues with section 2, where a brief discussion on

related works is presented. Fuzzy clustering is reviewed in section 3. Section 4

contains our proposed method. Experimental descriptions are presented in section

5. Experimental results and analysis are described in section 6, and finally, we

summarize this paper in section 7.

2

Related Works

According to Catal[5], software fault prediction became one of the noteworthy

research topics since 1990 and it includes two recent and comprehensive systematic

literature reviews [2,6]. The prediction techniques use approaches that originated

from the field of either statistics or machine learning. Some of these techniques are

genetic programming [7], decision trees [8] neural network [9], naïve bayes[10],

case-based reasoning [11], fuzzy logic [12] and the artificial immune recognition

system algorithms in [13,14,15]. As the number of related works in this area is too

much, we just presented some of them in this section.

Menzies et al. [10] conducted several experiments based on different data min-

ing algorithms with method level metrics on public NASA datasets. They evaluated

their work with probability of false alarm (PF) and probability of detection (PD),

and balance. They reported the best performer as naïve bayes and they used log-

transformation with Info-Gain filters before applying the algorithms. They claimed

that the best algorithm changes according to the dataset characteristics and numer-

ous experiments should be performed for a robust prediction model. They also

argued that since some models with low precision performed well, using it as a

reliable parameter for performance evaluation is not recommended. Although

Zhang et al. [16] criticized the paper but Menzies et al. defended their claim in [17].

Mahaweerawat et al. [18] presented a new approach for predicting software

faults by means of fuzzy clustering and radial basis function techniques. They ap-

plied the radial-basis function network after they used fuzzy subtractive clustering

to divide historical data into clusters in order to predict faults that occurred in the

component residing in each cluster. In 2007 [19], they performed a similar experi-

ment by using self-organizing map as a classifier instead of fuzzy approach. In both

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Search WWH ::

Custom Search

Home