Introduction - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

into the knowledge discovery process. Such knowledge can be used for pattern

evaluation as well as to guide the search toward interesting patterns.

Ad hoc data mining and data mining query languages: Query languages (e.g., SQL)

have played an important role in flexible searching because they allow users to pose

ad hoc queries. Similarly, high-level data mining query languages or other high-level

flexible user interfaces will give users the freedom to define ad hoc data mining tasks.

This should facilitate specification of the relevant sets of data for analysis, the domain

knowledge, the kinds of knowledge to be mined, and the conditions and constraints

to be enforced on the discovered patterns. Optimization of the processing of such

flexible mining requests is another promising area of study.

Presentation and visualization of data mining results: How can a data mining system

present data mining results, vividly and flexibly, so that the discovered knowledge

can be easily understood and directly usable by humans? This is especially crucial

if the data mining process is interactive. It requires the system to adopt expressive

knowledge representations, user-friendly interfaces, and visualization techniques.

1.7.3 Efficiency and Scalability

Efficiency and scalability are always considered when comparing data mining algo-

rithms. As data amounts continue to multiply, these two factors are especially critical.

Efficiency and scalability of data mining algorithms: Data mining algorithms must be

efficient and scalable in order to effectively extract information from huge amounts

of data in many data repositories or in dynamic data streams. In other words, the

running time of a data mining algorithm must be predictable, short, and acceptable

by applications. Efficiency, scalability, performance, optimization , and the ability to

execute in real time are key criteria that drive the development of many new data

mining algorithms.

Parallel, distributed, and incremental mining algorithms: The humongous size of many

data sets, the wide distribution of data, and the computational complexity of some

data mining methods are factors that motivate the development of parallel and dis-

tributed data-intensive mining algorithms . Such algorithms first partition the data

into “pieces.” Each piece is processed, in parallel, by searching for patterns. The par-

allel processes may interact with one another. The patterns from each partition are

eventually merged.

Cloud computing and cluster computing , which use computers in a distributed

and collaborative way to tackle very large-scale computational tasks, are also active

research themes in parallel data mining. In addition, the high cost of some data min-

ing processes and the incremental nature of input promote incremental data mining,

which incorporates new data updates without having to mine the entire data “from

scratch.” Such methods perform knowledge modification incrementally to amend

and strengthen what was previously discovered.

Search WWH ::

Custom Search

Home