Database Reference
In-Depth Information
•
How long it will take to fix?
•
How many people were involved?
What is the problem description?
Accordingly, attributes such as
Severity (serious
,
critical
or
non-critical), Priority
(high, medium or low), Class (sw-bug, doc-bug, change-request, duplicate, mistaken,
or
support), Arrival-Date, Closed-Date, Responsible,
and
Synopsis
are considered for
mining.
The attribute '
Class
' is chosen as the target attribute in order to find out any
valuable knowledge of the type of the problem with the rest of the PR attributes.
Knowing the relationship between the fix effort and the PR class, a project leader can
analyse the fix effort versus the human resources available. This knowledge can now
be used in the scheduling and resource planning.
Every PR has an attribute,
State
(
open, active, analysed, suspended, feedback,
resolved and closed),
to indicate the current stage of the PR. Since, the aim of this
mining exercise is to find useful knowledge from existing projects, the PRs with a
closed
value in their
State
field are only considered.
The first five fields have fixed input values.
Responsible
attribute is used to
calculate how many people were involved to fix the problem.
Association or
classification rules
are generated by applying DM techniques on these fields. The
Synopsis
field has text information. It may contain what type of a project document (a
piece of code or a support document) that the PR is concerned with. It can be used as
a text index.
Text Mining
is considered to analyse this qualitative information.
•
Data Cleaning:
The data set has some noise due to evolution of the data acquisition
system and human involvement with the process. An example is the use of different
terminologies over the time such as
SW-bug
or
sw-bug
as an input value for
Class
field (Example
a
,
d
in Figure 2). A
Time-Zone
field and other new input values have
been added later in the system on management request based on the feedback of users
after several years of system running.
To handle with the erroneous PRs, attempts are made to recover errors in PRs
manually or automatically. If successful, the modified PRs are included in the mining
process. For example,
SW-bug
in
Class
field is replaced by
sw-bug
throughout the
data. The
Completed-Date
field (that was obsolete after some year of usage) is
deleted, and the value (if any) is copied into the
Closed-Date
field.
The PRs, in which an error cannot be recovered precisely, are either discarded or
replaced by a '?' if a software can handle the missing values. For example, the instance
a
in Figure 2 has its closed time earlier than the time being raised. Some PRs do not
have all the values stored; such as Example
c
in Figure 2 has no closed date. An
example of inconsistent values is shown in Figure 2 - there is no input for the
Time-
Zone
field in a PR recorded before 1998, as the
Time-Zone
field is added in 1998.
Data Transformation:
The attributes
Arrival-Date
and
Closed-Date
are transformed
to a time-period - identifying the time spent to fix a PR - by taking account the
additional information
Time-Zone
and
Responsible
. This transformation resulted in
the
Time-to-fix
attribute with continuous values (figure 3). The
Responsible
attribute
has the information about personnel engaged in rectifying the problem. We assume
that the derived attribute
Time-to-fix
is total
time spent to fix a problem if there is only
Search WWH ::
Custom Search