Database Reference
In-Depth Information
improved. An increasing number of software tools have become available for tutors to
select. In 2000, the Cross Industry Standard Process for Data Mining (CRISP-DM)
was published [2]. For the first time, a rigorous step-by-step industrial standard meth-
odology has been introduced to and endorsed by many data mining practitioners. The
methodology provides students with a complete lifecycle to mimic and a guideline for
detailed actions and tasks to follow. Furthermore, an increasing number of successful
cases of data mining have been reported ([1], [5], [6]). These cases become good
references for students in preparation for their own projects.
A data mining project is harder than a database design project due to the uncertain
nature of data mining, and therefore faces its own difficulties and challenges. This
paper is intended to share the author's experience in this regard. The paper is a fol-
low-up of an early work regarding the design of a data mining module for an under-
graduate computing programme [3]. A data mining project is intended as a major part
of the coursework for that module.
The rest of the paper is organised as follows. Section 2 outlines a specification of a
data mining mini-project. The paper then addresses related issues arising from the
specification, and proposes a framework for administrating and assessing various
aspects of the project. In section 3, the paper uses a number of selected projects from
the author's own classes as case studies and measures their successes according to the
proposed framework. In section 4, the paper highlights the uncertain natures of data
mining as well as the challenges and the difficulties involved, and summarises some
useful lessons learnt.
2 Data Mining Mini-project: A Specification
2.1 Project Aim, Objectives and Scope
The data mining mini-project, referred to as the project hereafter, is concerned with
discovering possible hidden patterns from a given data set by using a data mining
software tool. The purpose of the project is to provide students with an opportunity to
experience the complete lifecycle of data mining. In particular, students are required
to follow the principles of the CRISP-DM methodology, define and undertake rele-
vant tasks, exercise judgement and make justifiable decisions over relevant issues
throughout the whole data mining process.
The key word here is experience . The project makes the students go through the
practical process and face real challenges of making decisions in uncertain situations.
It is unrealistic, however, to treat the project as real-life data mining and expect stu-
dents handling it as professionals. Data mining is an art that requires a lot of practice
to master. Consequently, the usefulness of the discovered patterns is much less impor-
tant than what the students learn through their experience. Again, an analogy to a
database design project can be drawn: we are more interested in the process of devel-
oping a database than the final database product.
Because of the discovery nature of data mining, the project scope must be con-
trolled carefully. First, the project should be a joint group work by 2 or 3 students
taken over a period of 5 to 6 weeks. The group project enables sharing of workload
and at the same time encourages debates over related issues. Second, the project
Search WWH ::




Custom Search