Information Technology Reference
In-Depth Information
experiments, the claimed that they could predict software faults reasonably accurate
but unfortunately very few experimental details were provided by them in both
papers.
Zhong et al. [20,21] applied clustering along with expert-based approach to
solve fault prediction problem. They used k-means and neural-gas techniques for
clustering different real data sets and then an expert decided whether each cluster
representative should be labeled as faulty or non-faulty. After analyzing the results
in terms of the overall error rate, they affirmed that k-means clustering-based ap-
proached performed slightly better than neural-gas-based approach on large data
sets.
Yuan et al. [22] used fuzzy subtractive clustering with module-order modeling
in order to build prediction model. First fuzzy subtractive clustering was used to
predict the number of faults, and then module-order modeling was applied to pre-
dict whether modules were faulty or not. Based on the case study, it was found that
proposed approach could classify modules that will likely have faults.
Catal and Diri [13] focused on the high-performance fault predictors based on
machine learning such as random forests and algorithms based on artificial immune
systems on public NASA datasets. They reported that random forests provides the
best prediction performance for large datasets and naive bayes is the best prediction
algorithm for small datasets in terms of the area under receiver operating character-
istics curve (AUC).
Alan and Catal [23] proposed an outlier detection approach using metrics
thresholds and class labels to identify class outliers. They evaluated their approach
on public NASA datasets. They stated that there proposed outlier detection method
improves the performance of robust fault prediction models based on naïve bayes
and random forests algorithms.
Rodriguez et al. [24] investigated two well-known subgroup discovery algo-
rithms, the SD algorithm, and the CN2-SD algorithm to obtain rules that identify
defect prone modules. The experiments performed on object-oriented metrics da-
tasets from Eclipse repository showed that the EDER-SD algorithm performs well
in most cases when compared to three other well-known SD algorithms.
3
Fuzzy Clustering
Clustering algorithms group the modules according to similarity of their software
attributes. In fact, program modules with similar attributes are clustered together as
they have similar quality characteristics; furthermore, dissimilarity of data located
in separate clusters should be as high as possible. Proper data clustering technique
will enhance not only the efficiency of the training process, but also the perfor-
mance of the model predictability precision. Accurate predictions obtained from
such a good reliability model will be favorable toward higher software process
Search WWH ::




Custom Search