Database Reference
In-Depth Information
9.1 Data Mining
Data mining is the analysis of often large data sets to find unsuspected
interesting relationships and to summarize data in novel ways that are both
understandable and useful to the users. Mining information and knowledge
from large databases had become nowadays a key topic in database systems.
Thus, vendors of such systems, as well as the academic community, have
been giving increasing attention to the development of data mining tools. The
growing capability to collect and process data, enhanced with the possibilities
given by data warehousing, has generated the necessity to have tools which
can help to handle this explosive growth and to extract useful information
from such data, and data mining has emerged as an answer to these needs.
Data mining is a single step in a larger process called knowledge
discovery in databases , which aims at the extraction of nontrivial,
implicit, previously unknown, and potentially useful information from data
in databases. The knowledge discovery process involves several steps such
as data cleaning, selection, transformation, reduction, model selection, and,
finally, exploitation of the extracted knowledge.
Data mining borrows from several scientific fields like artificial intelligence,
statistics, neural networks, and other ones, but the need for a separate
research area is justified by the size of the data collections under analysis.
The information is hidden in large and often heterogeneous collections of
data, located in several different sources, with users demanding friendly and
effective visualization tools. On the other hand, usual data mining queries
cannot be answered in plain SQL. Moreover, it is often the case that the
user does not know what she is looking for, so she needs an interactive
environment, which can be provided by a combination of OLAP and data
mining tools.
We point out next some requirements in data mining not covered by the
scientific fields from which it inherits:
￿ Heterogeneous data must be handled, in addition to traditional relational
data. For example, textual, web, spatial, and temporal data, among others,
must be supported.
￿ Ecient and scalable algorithms are required due to the size of the data
under analysis.
￿ Graphical user interfaces are necessary for knowledge discovery, since it is
often the case that nonexpert users interact with such systems.
￿ Privacy-aware data mining algorithms must be developed, since data are
used in strategic planning and decision making, increasing the need for
data protection, in addition to privacy regulation compliance.
￿ Mining at different abstraction levels also needs to be supported. Some-
times, essential knowledge that cannot be found at some level of abstrac-
tion could be discovered at finer or coarser granularity levels.
Search WWH ::




Custom Search