Information Technology Reference
In-Depth Information
Since the threshold values for only seven different metrics were established by
Integrated Software Metrics, Inc. (ISM) [28] as a deciding factor about faultiness of
the modules, we clustered the NASA set as well as Turkish set (refer to 5.1) based
on these thresholds. According to our experimental results, all selected datasets
performed best with choosing the number of clustered as seven and six for NASA
and Turkish sets respectively. The reason why Turkish set is clustered based on six
clusters is that, one of the measurement's metrics did not identified for Turkish
dataset. The metrics that clustering has been done based on them are line of code,
cyclomatic complexity, unique operator, unique operand, total operator, total oper-
and, and essential complexity. Essential complexity is not available for Turkish
dataset.
4.2 Two Stage Outlier Removal
In this part, our motivation was to use outlier detection methods based on Alan and
Catal [23] and that depend on the usage of software metrics thresholds and fault
data (depicted as faulty or not faulty in the software measurement dataset).Outlier
removal is done in two stages. At first, data object are marked as outlier and elimi-
nated if five or six metrics exceed their corresponding metrics thresholds and if the
data object's class label is not-faulty. After this stage, data objects are removed
from data set if all the measurement metrics are below metrics thresholds and if the
class label is faulty.
4.3 Testing Stage
According to Fig. 2, testing stage consists of three main steps. At first, a module
from test dataset is selected. Then similarity between that selected module and
cluster's centers are calculated based on Euclidean distance to specify the three
most similar clusters to the selected test module. Second, after these three clusters
are identified, again, the similarity between each of these clusters and tested mod-
ule is computed and label of the test module (faulty/non-faulty) is determined
according to the label of the most similar modules in each of the three selected
clusters with the test module. The final decision about the label of the test module
is made based on the majority ranking of the labels of the selected modules, which
has the most similarity with test module. For example if two out of three of these
similar selected modules from the clusters predict the module's label as faulty, the
test module is labeled as faulty.
5
Experimental Description
This section describes all information required for conducting the experiments such
as dataset selection and performance evaluation criteria.
Search WWH ::




Custom Search