Data Mining Trends and Research Frontiers - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

framework for the development, evaluation, and practice of data mining technology.

Several theories for the basis of data mining include the following:

Data reduction : In this theory, the basis of data mining is to reduce the data rep-

resentation. Data reduction trades accuracy for speed in response to the need to

obtain quick approximate answers to queries on very large databases. Data reduc-

tion techniques include singular value decomposition (the driving element behind

principal components analysis), wavelets, regression, log-linear models, histograms,

clustering, sampling, and the construction of index trees.

Data compression : According to this theory, the basis of data mining is to compress

the given data by encoding in terms of bits, association rules, decision trees, clusters,

and so on. Encoding based on the minimum description length principle states that

the “best” theory to infer from a data set is the one that minimizes the length of the

theory and of the data when encoded, using the theory as a predictor for the data.

This encoding is typically in bits.

Probability and statistical theory : According to this theory, the basis of data min-

ing is to discover joint probability distributions of random variables, for example,

Bayesian belief networks or hierarchical Bayesian models.

Microeconomic view : The microeconomic view considers data mining as the task

of finding patterns that are interesting only to the extent that they can be used in

the decision-making process of some enterprise (e.g., regarding marketing strategies

and production plans). This view is one of utility, in which patterns are considered

interesting if they can be acted on. Enterprises are regarded as facing optimization

problems, where the object is to maximize the utility or value of a decision. In this

theory, data mining becomes a nonlinear optimization problem.

Pattern discovery and inductive databases : In this theory, the basis of data mining

is to discover patterns occurring in the data such as associations, classification mod-

els, sequential patterns, and so on. Areas such as machine learning, neural network,

association mining, sequential pattern mining, clustering, and several other subfields

contribute to this theory. A knowledge base can be viewed as a database consisting

of data and patterns. A user interacts with the system by querying the data and the

theory (i.e., patterns) in the knowledge base. Here, the knowledge base is actually an

inductive database.

These theories are not mutually exclusive. For example, pattern discovery can also

be seen as a form of data reduction or data compression. Ideally, a theoretical frame-

work should be able to model typical data mining tasks (e.g., association, classification,

and clustering), have a probabilistic nature, be able to handle different forms of data,

and consider the iterative and interactive essence of data mining. Further efforts are

required to establish a well-defined framework for data mining that satisfies these

requirements.

Search WWH ::

Custom Search

Home