Database Reference
In-Depth Information
Rule 2: If severity= critical and Time-to-fix = less than 3 days and priority = high
Then class = sw-bug. Confidence = 75.2%, Support = 2.3%
Overall, the extracted rules infer that the software related bugs can be fixed within 3
days with above 75% confidence if they have high priority and are in critical
condition. It may take 3 months to fix the problem if the corresponding priority and
severity are graded as medium and serious.
The software C5 was also used to perform classification data mining. We also
utilised boosting and cross validation (Table 2). Boosting is a technique for generating
and combining multiple classifiers to give improved predictive accuracy. After a
number of trials, several different decision trees or rule sets are combined to reduce
error rate for prediction. Boosting takes a longer time to produce the final classifier,
and may not always achieve better results than a single classifier approach does,
especially when the training data set has noise. Boosting and cross validation
techniques do not generate a new rule, but try to find a better rule from the existing
results. They only produce better results than the individual trees if the individual
trees disagree with one another.
Table 2. C5 Mining Results Summary
Mining with cross-
validation (10-fold)
Normal mining
Mining with Boosting
Training
Testing
Training
Testing
Training
Testing
#Rules
51
N/A
N/A.
N/A
57.7
N/A
Error Rate (%)
(Rules)
41.5
42.6
41.3
42.6
43.9
42.8
Error Rate (%)
(Trees)
40.3
42.5
39.4
42.6
44.1
43.1
Size of tree
141
N/A.
N/A.
N/A
121.9
N/A.
Process
Time
5.6
0.2
37.7
0.4
41.1
1.1
(seconds)
Some example extracted classification rules with C5 are:
Rule 1: When a PR is in low priority and the time spent is around half a day (0.5 day)
Then the rule has a high probability (87.5% Confidence) to classify a bug to
be a document related bug .
Rule 2: When a PR is in medium priority with non-critical severity and the time spent
is around 1.1 day
Then the rule has 84.6% Confidence to classify a bug to be a document
related bug .
Rule 3: When a PR is in low priority and the time spent for fixing is around 1 week
Then the rule has 83.3% Confidence to classify a bug to be a software bug .
In general, all the rule sets achieve around 42% training error rate (the lowest is
40.3%, the highest is 43.9%) and 42.5% test error rate (the lowest is 39.4%, the
Search WWH ::




Custom Search