Database Reference
In-Depth Information
Chapter 2
What Is Data Mining and How Does It Work?
Toon Calders and Bart Custers 1
Abstract. Due to recent technological developments it became possible to
generate and store increasingly larger datasets. Not the amount of data, however,
but the ability to interpret and analyze the data, and to base future policies and
decisions on the outcome of the analysis determines the value of data. The
amounts of data collected nowadays not only offer unprecedented opportunities to
improve decision procedures for companies and governments, but also hold great
challenges. Many pre-existing data analysis tools did not scale up to the current
data sizes. From this need, the research filed of data mining emerged. In this
chapter we position data mining with respect to other data analysis techniques and
introduce the most important classes of techniques developed in the area: pattern
mining, classification, and clustering and outlier detection. Also related,
supporting techniques such as pre-processing and database coupling are discussed.
2.1 Introduction
In this chapter, we explain what data mining is and how it works. In Section 2.2
we start with exploring data mining as a research area and comparing it with
related research areas, such as statistics, machine learning, data warehousing and
online analytical processing. In Section 2.3 we discuss some common terminology
regarding data mining that will be used throughout this topic. In Section 2.4 we
explain some basic discovery algorithms: classification, clustering and pattern
mining. In Section 2.5 some supporting techniques are explained. These include
pre-processing techniques (such as discretization, missing value imputation,
dimensionality reduction and feature extraction and construction) and database
 
Search WWH ::




Custom Search