Use of Data Mining in System Development Life Cycle - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Additionally, a large portion of the SDLC process is based on background knowledge

of personnel involved. A DM technique should learn to incorporate the priori

knowledge in its process.

Another aspect of DM that can be a problem is the presentation and visualization

of the complex results. Output of a mining process is usually a large number of

meaningful rules. However the representation of these rules to assist a project

manager in making strategic directions requires significant post-processing.

Performance Issues: These include efficiency, scalability, and user effectiveness of

data mining algorithms and tools. The performance metrics assessing the

appropriateness of DM methods to SDLC includes robustness, scalability, automatic

pre-processing capability, reliability, noise tolerance and sensitivity analysis [6]. A DM

tool should be able to include all (or majority) of these to get the user satisfaction.

3 A Case Study: Analysis of Problem Report Data

This section describes the application of DM techniques to the software Problem

Report (PR) management data of a large global telecommunication company. When a

problem is reported, the responsible team can only approximately suggest the efforts

(time) to fix the problem based on their previous experience. If the current project is

not within their familiar topics, the accuracy of the estimation becomes worse.

The goal of this mining process is to provide estimation of effort to fix when a

problem is raised. The results will reveal the hidden relationships in data, such as:

•

How long does it take to fix a problem when a particular type of PR is raised?

What type of project documents needs significant efforts to fix the associated bug?

This will bring great cost savings and benefits to the organisation by the improved

control over the PR fixing and an accurate project planing, estimation and progress

control. The results will especially be useful to developers in problem reasoning.

When a programmer is struggling with a bug, a resolution can be suggested from the

knowledge inferred from the previous similar problems stored in PRs.

•

3.1 Data Pre-processing

The first task in the process is to prepare the data set according to the DM techniques.

Field Selection: The PR data consists of textual information, categorical and

numerical fields. Several fields such as confidential, submitter-ID, environment, fix,

release note, audit trail, the associated project name and the PR number are ignored

during mining. These are used in pre-processing and post-processing stages to assist

in the selection of data and a better understanding of the rules being found.

Whenever a PR is raised, a project leader will have to find answers for the

following questions before taking any action:

•

How severe the problem is (customer impact)?

•

What is the impact of the problem on project schedule (Cost & Team priority)?

•

What type of the problem it is (a Software bug or a design flaw)?

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home