Use of Data Mining in System Development Life Cycle - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

correct prediction rate is achieved in testing data set (the lowest has 43.51%, the

highest has 58.25%). Another interesting point is that the attempt to improve the

accurate prediction in the way of equal-distributed target-value samples does not lead

much change; there is only roughly 3% improvement over the final result. The error

rates from using multiple supports are higher and the number of extracted rules is

lower than those from using single support mining engine.

The continuous time values result better than manually discretized values. This

indicates that the discretized values may have resulted in some information loss.

Table 1. CBA Mining Results Summary. Rules are ranked by confidence.

Error rate (%)

Time cost (seconds)

#Rules

Training

Testing

Training

Testing

Case1-SS-D

46.16

52.94

1.00

0.08

Case1-SS-C

45.180

47.56

1.01

0.07

Case1-MS-D

47.059

47.49

1.01

0.10

Case1-MS-C

45.180

47.56

1.04

0.09

59.95

Case2-SS-D

57.04

0.41

1.1

Case2-SS-C

58.09

0.44

1.3

57.39

Case2-MS-D

59.10

58.25

0.44

1.0

Case2-MS-C

58.45

58.91

0.45

1.2

Case3-SS-D

43.61

44.5

2.2

2.0

Case3-SS-C

43.5

43.8

2.2

2.0

Case3-MS-D

46.5

45.1

1.6

1.9

Case3-MS-C

46.5

46.9

1.6

10-CV-SS-D

50.5

52.5

10-CV-SS-C

46.05

46.89

25.4

10-CV-MS-D

48.87

49.1

28.9

10-CV-MS-C

45.98

25.3

45.02

Case4-SS-D

46.16

N/A

0.60

N/A

Case4-SS-C

45.180

N/A

0.66

N/A

Case4-MS-D

47.059

N/A

0.77

N/A

Case4-MS-C

45.180

N/A

1.04

N/A

There is no rule that has confidence value larger than 80%, however they do

describe some characters of the PR fixing process. Therefore they are useful for the

project management in estimating bug fixing related time issues.

Followings are examples of generated classification rules with CBA:

Rule 1: If severity= non-critical and Time-to-fix = 3 to 30 days and priority= medium

Then class = doc-bug. Confidence = 82.7%, Support = 2.7%

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home