Database Reference
In-Depth Information
How long it will take to fix?
How many people were involved?
What is the problem description?
Accordingly, attributes such as Severity (serious , critical or non-critical), Priority
(high, medium or low), Class (sw-bug, doc-bug, change-request, duplicate, mistaken,
or support), Arrival-Date, Closed-Date, Responsible, and Synopsis are considered for
mining.
The attribute ' Class ' is chosen as the target attribute in order to find out any
valuable knowledge of the type of the problem with the rest of the PR attributes.
Knowing the relationship between the fix effort and the PR class, a project leader can
analyse the fix effort versus the human resources available. This knowledge can now
be used in the scheduling and resource planning.
Every PR has an attribute, State ( open, active, analysed, suspended, feedback,
resolved and closed), to indicate the current stage of the PR. Since, the aim of this
mining exercise is to find useful knowledge from existing projects, the PRs with a
closed value in their State field are only considered.
The first five fields have fixed input values. Responsible attribute is used to
calculate how many people were involved to fix the problem. Association or
classification rules are generated by applying DM techniques on these fields. The
Synopsis field has text information. It may contain what type of a project document (a
piece of code or a support document) that the PR is concerned with. It can be used as
a text index. Text Mining is considered to analyse this qualitative information.
Data Cleaning: The data set has some noise due to evolution of the data acquisition
system and human involvement with the process. An example is the use of different
terminologies over the time such as SW-bug or sw-bug as an input value for Class
field (Example a , d in Figure 2). A Time-Zone field and other new input values have
been added later in the system on management request based on the feedback of users
after several years of system running.
To handle with the erroneous PRs, attempts are made to recover errors in PRs
manually or automatically. If successful, the modified PRs are included in the mining
process. For example, SW-bug in Class field is replaced by sw-bug throughout the
data. The Completed-Date field (that was obsolete after some year of usage) is
deleted, and the value (if any) is copied into the Closed-Date field.
The PRs, in which an error cannot be recovered precisely, are either discarded or
replaced by a '?' if a software can handle the missing values. For example, the instance
a in Figure 2 has its closed time earlier than the time being raised. Some PRs do not
have all the values stored; such as Example c in Figure 2 has no closed date. An
example of inconsistent values is shown in Figure 2 - there is no input for the Time-
Zone field in a PR recorded before 1998, as the Time-Zone field is added in 1998.
Data Transformation: The attributes Arrival-Date and Closed-Date are transformed
to a time-period - identifying the time spent to fix a PR - by taking account the
additional information Time-Zone and Responsible . This transformation resulted in
the Time-to-fix attribute with continuous values (figure 3). The Responsible attribute
has the information about personnel engaged in rectifying the problem. We assume
that the derived attribute Time-to-fix is total time spent to fix a problem if there is only
Search WWH ::




Custom Search