Introduction - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

Chapter 1

Introduction

Abstract The main background addressed in this topic should be presented

regarding Data Mining and Knowledge Discovery. Major concepts used through-

out the contents of the rest of the topic will be introduced, such as learning models,

strategies and paradigms, etc. Thus, the whole process known as Knowledge Dis-

covery in Data is provided in Sect. 1.1 . A review on the main models of Data Mining

is given in Sect. 1.2 , accompanied a clear differentiation between Supervised and

Unsupervised learning (Sects. 1.3 and 1.4 , respectively). In Sect. 1.5 , apart from the

two classical data mining tasks, we mention other related problems that assume

more complexity or hybridizations with respect to the classical learning paradigms.

Finally, we establish the relationship between Data Preprocessing with Data Mining

in Sect. 1.6 .

1.1 Data Mining and Knowledge Discovery

Vast amounts of data are around us in our world, raw data that is mainly intractable

for human or manual applications. So, the analysis of such data is now a necessity.

The World Wide Web (WWW), business related services, society, applications and

networks for science or engineering, among others, are continuously generating data

in exponential growth since the development of powerful storage and connection

tools. This immense data growth does not easily allow to useful information or orga-

nized knowledge to be understood or extracted automatically. This fact has led to the

start of Data Mining (DM), which is currently a well-known discipline increasingly

preset in the current world of the Information Age.

DM is, roughly speaking, about solving problems by analyzing data present in

real databases. Nowadays, it is qualified as science and technology for exploring

data to discover already present unknown patterns. Many people distinguish DM as

synonym of the Knowledge Discovery in Databases (KDD) process, while others

view DM as the main step of KDD [ 16 , 24 , 32 ].

There are various definitions of KDD. For instance, [ 10 ] define it as “the nontrivial

process of identifying valid, novel, potentially useful, and ultimately understandable

patterns in data” [ 11 ] considers the KDD process as an automatic exploratory data

Search WWH ::

Custom Search

Home