Increasing the Accuracy of Software Fault Prediction Using Majority Ranking Fuzzy Clustering* - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Information Technology Reference

In-Depth Information

experiments, the claimed that they could predict software faults reasonably accurate

but unfortunately very few experimental details were provided by them in both

papers.

Zhong et al. [20,21] applied clustering along with expert-based approach to

solve fault prediction problem. They used k-means and neural-gas techniques for

clustering different real data sets and then an expert decided whether each cluster

representative should be labeled as faulty or non-faulty. After analyzing the results

in terms of the overall error rate, they affirmed that k-means clustering-based ap-

proached performed slightly better than neural-gas-based approach on large data

sets.

Yuan et al. [22] used fuzzy subtractive clustering with module-order modeling

in order to build prediction model. First fuzzy subtractive clustering was used to

predict the number of faults, and then module-order modeling was applied to pre-

dict whether modules were faulty or not. Based on the case study, it was found that

proposed approach could classify modules that will likely have faults.

Catal and Diri [13] focused on the high-performance fault predictors based on

machine learning such as random forests and algorithms based on artificial immune

systems on public NASA datasets. They reported that random forests provides the

best prediction performance for large datasets and naive bayes is the best prediction

algorithm for small datasets in terms of the area under receiver operating character-

istics curve (AUC).

Alan and Catal [23] proposed an outlier detection approach using metrics

thresholds and class labels to identify class outliers. They evaluated their approach

on public NASA datasets. They stated that there proposed outlier detection method

improves the performance of robust fault prediction models based on naïve bayes

and random forests algorithms.

Rodriguez et al. [24] investigated two well-known subgroup discovery algo-

rithms, the SD algorithm, and the CN2-SD algorithm to obtain rules that identify

defect prone modules. The experiments performed on object-oriented metrics da-

tasets from Eclipse repository showed that the EDER-SD algorithm performs well

in most cases when compared to three other well-known SD algorithms.

3

Fuzzy Clustering

Clustering algorithms group the modules according to similarity of their software

attributes. In fact, program modules with similar attributes are clustered together as

they have similar quality characteristics; furthermore, dissimilarity of data located

in separate clusters should be as high as possible. Proper data clustering technique

will enhance not only the efficiency of the training process, but also the perfor-

mance of the model predictability precision. Accurate predictions obtained from

such a good reliability model will be favorable toward higher software process

Search WWH ::

Custom Search

Home