Data Mining Project: A Critical Element in Teaching, Learning and Assessment of a Data Mining Module - Advances in Databases

Database Reference

In-Depth Information

The report and the oral presentation should not be marked separately. It is sug-

gested that a provisional mark is first given to the work described in the report. This

mark is either confirmed or adjusted accordingly at the oral presentation.

3 Case Studies: The Good, the Bad and the Ugly

The project was first used in the author's data mining module in 2001. The project has

become an established part of the coursework ever since. The weight for the project in

the coursework has increased from a mere 40% at the beginning to 60% later and to

the current 100% in 2010. In 2007, the author created a data mining module at Sara-

jevo School of Science and Technology. The project was and still is a significant part

(60%) of the coursework for that module.

Between 2001 and 2005, the author mainly relied on data sets from the UCI Ma-

chine Learning Repository [11] and a few real-life data sets from his consultancy

work. Since then, data source was no longer a major concern. Before 2006, IBM In-

telligent Miner for Data was used as the data mining tool. Since 2006, Weka has been

used for its rich solutions, simplicity and availability for students. Microsoft Excel has

also been used for assisting data exploration and data pre-processing.

Figure 1 shows the distribution of the project works in the four categories for the

35 project groups collected between 2002 and 2010 for the purpose of this paper. It is

true that most of the classes are fairly small in size, and the biggest class size is 34

students (11 groups). Majority of project results fall into Fair and Good categories

while Unsatisfactory and Excellent projects are in the minority. This seemingly good

result reflects the practice that we mark on the project experience and what students

learn from it. This paper selects three typical projects from the 35 group projects as

case studies. The purpose is to show the nature of the tasks involved, student works

produced and assessment of their works.

15

10

6

4

Unsatisfactory

Fair

Good

Excellent

Fig. 1. Distribution of Project Categories

3.1 Project One: The Bad

The Data Set

The data set for this project is about insurance purchases. It has 14,845 recordings

about purchased product type, regional code, profession, age, first rate paid, contract-

ing sum and closing date. Potentially interesting patterns include a classification

model regarding which type of product is purchased by what kinds of customers.

Search WWH ::

Custom Search

Home