Last Words: Conclusion - Realtime Data Mining

Database Reference

In-Depth Information

statisticians or at least by experts. Rather than learning in real time, they learn from

historical data, much of which is stored in dinosaur applications like data ware-

houses. Rather than being integrated directly into applications, they run as separate

programs with unwieldy GUIs. Rather than understanding the problem as an

interaction of analysis and decision, most of them disregard the decision aspect

completely and concentrate entirely on analysis, as a result of which the question of

their interaction never even arises. Rather than formulating the problem in the

mathematically conventional operator syntax (e.g., as a differential equation), many

of them still use the terminology of neural networks, genetic algorithms, etc. Rather

than breaking down the solution hierarchically, both in terms of content and

mathematically, the often immense problems are approached as a single, gigantic,

data block, in the hope that the method will somehow cut its way through the mass

of data. Rather than carrying out local analyses on a distributed basis and only

joining the results (“taking software to the data”), all data has to be centralized, after

which it is stored in an inflexible and incomplete form in massive data warehouses

(“taking data to the software”).

By contrast, in this topic, we made an attempt at devising approaches that satisfy

the above requirements, though not entirely (in particular, we hardly addressed

the last requirement concerning distributedness in its most visionary form) but

in essence at least. This is what we have tried to illustrate in this topic. As such, it

is the trailblazer for a completely new way of thinking in data mining.

Many of the ideas presented in this topic, especially that of reinforcement

learning, have originated in artificial intelligence research. Being mathematical

computer scientists, we have been aspiring to draw a crisp distinction between

mathematical modeling of a real-world problem on one hand and devising compu-

tational methods for solving the emerging equations on the other hand. This course

of action is still somewhat uncommon in the data mining and artificial intelligence

community, where algorithms are often conceived as models of real-world agents

solving real-world problems rather than methods to solve mathematical problems

that, in turn, represent real-world problems. We believe that our mathematical

approach provides insights as to which technical assumptions these AI methods

are actually based on, under which circumstance their success can be guaranteed,

and what their limitations are. Furthermore, it enables to figure novel and more

efficient implementations that facilitate dealing with large and high-dimensional

data sets and enable realtime operation.

Nevertheless, our approach has shortcomings of its own. Many of the computa-

tional methods devised in this topic, especially multigrid methods and tensor

approximation for reinforcement learning, are based on, or, at least, inspired by

frameworks for problems arising in discretization and numerical treatment of

differential equations. The latter setting may be characterized as follows:

1. Continuity : A differential equation is a continuous model of a physical

phenomenon.

2. Physical interpretability : Mathematical structure arises for physical reasons.

3. A priori model : The parameters of the model are available beforehand.

Realtime Data Mining

Search WWH ::

Custom Search

Home