What Is Data Mining and How Does It Work? - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Chapter 2

What Is Data Mining and How Does It Work?

Toon Calders and Bart Custers 1

Abstract. Due to recent technological developments it became possible to

generate and store increasingly larger datasets. Not the amount of data, however,

but the ability to interpret and analyze the data, and to base future policies and

decisions on the outcome of the analysis determines the value of data. The

amounts of data collected nowadays not only offer unprecedented opportunities to

improve decision procedures for companies and governments, but also hold great

challenges. Many pre-existing data analysis tools did not scale up to the current

data sizes. From this need, the research filed of data mining emerged. In this

chapter we position data mining with respect to other data analysis techniques and

introduce the most important classes of techniques developed in the area: pattern

mining, classification, and clustering and outlier detection. Also related,

supporting techniques such as pre-processing and database coupling are discussed.

2.1 Introduction

In this chapter, we explain what data mining is and how it works. In Section 2.2

we start with exploring data mining as a research area and comparing it with

related research areas, such as statistics, machine learning, data warehousing and

online analytical processing. In Section 2.3 we discuss some common terminology

regarding data mining that will be used throughout this topic. In Section 2.4 we

explain some basic discovery algorithms: classification, clustering and pattern

mining. In Section 2.5 some supporting techniques are explained. These include

pre-processing techniques (such as discretization, missing value imputation,

dimensionality reduction and feature extraction and construction) and database

Search WWH ::

Custom Search

Home