Data Analytics: Exploiting the Data Warehouse - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

9.1 Data Mining

Data mining is the analysis of often large data sets to find unsuspected

interesting relationships and to summarize data in novel ways that are both

understandable and useful to the users. Mining information and knowledge

from large databases had become nowadays a key topic in database systems.

Thus, vendors of such systems, as well as the academic community, have

been giving increasing attention to the development of data mining tools. The

growing capability to collect and process data, enhanced with the possibilities

given by data warehousing, has generated the necessity to have tools which

can help to handle this explosive growth and to extract useful information

from such data, and data mining has emerged as an answer to these needs.

Data mining is a single step in a larger process called knowledge

discovery in databases , which aims at the extraction of nontrivial,

implicit, previously unknown, and potentially useful information from data

in databases. The knowledge discovery process involves several steps such

as data cleaning, selection, transformation, reduction, model selection, and,

finally, exploitation of the extracted knowledge.

Data mining borrows from several scientific fields like artificial intelligence,

statistics, neural networks, and other ones, but the need for a separate

research area is justified by the size of the data collections under analysis.

The information is hidden in large and often heterogeneous collections of

data, located in several different sources, with users demanding friendly and

effective visualization tools. On the other hand, usual data mining queries

cannot be answered in plain SQL. Moreover, it is often the case that the

user does not know what she is looking for, so she needs an interactive

environment, which can be provided by a combination of OLAP and data

mining tools.

We point out next some requirements in data mining not covered by the

scientific fields from which it inherits:

Heterogeneous data must be handled, in addition to traditional relational

data. For example, textual, web, spatial, and temporal data, among others,

must be supported.

Ecient and scalable algorithms are required due to the size of the data

under analysis.

Graphical user interfaces are necessary for knowledge discovery, since it is

often the case that nonexpert users interact with such systems.

Privacy-aware data mining algorithms must be developed, since data are

used in strategic planning and decision making, increasing the need for

data protection, in addition to privacy regulation compliance.

Mining at different abstraction levels also needs to be supported. Some-

times, essential knowledge that cannot be found at some level of abstrac-

tion could be discovered at finer or coarser granularity levels.

Search WWH ::

Custom Search

Home