Brave New Realtime World: Introduction - Realtime Data Mining

Database Reference

In-Depth Information

complex assumptions, etc. For example, initially we will only calculate recommen-

dations based on the current product and only optimize them in a single step. Later

on we can then discard the second and ultimately also the first requirement, by

extending the method accordingly.

A good illustration of this is the discussion about infinity. Philosophers blustered

about the meaning of infinity for centuries, but it was scientists in the eighteenth

century working on the specific task of infinitesimal calculus who reduced the

concept of infinity to epsilon estimations. Suddenly infinity was easy to understand

and merely an abstraction. This viewpoint had become generally established when

at the end of the nineteenth century, while working on his continuum theory, the

German mathematician Georg Cantor dropped the bombshell that infinity does in

fact exist and can even be used in calculations. After much debate, this ultimately

led to a greater understanding of the concept of infinity, which then found expres-

sion in philosophy too.

Conversely, however, it is often argued that complex data mining algorithms are

not worthwhile because they are difficult to master. It is better, so the argument

goes, to use a simple algorithm and to provide large data sets. A classic example of

this is Google, which successfully uses a relatively simple search algorithm on vast

data sets. There is also an example of this in the area of recommendation engines:

Amazon's item-to-item collaborative filtering (ITI CF). Quite simple in mathemat-

ical terms, it has displaced the previously used collaborative filtering, which was

very complex and poorly scaled.

Although this view seems perfectly pragmatic, and in the cases described here

has been successful too, it is nevertheless shortsighted. Generally speaking, one

could argue that people would still be living in caves if they had followed this way

of thinking. But there are also some very specific reasons for not adopting this

approach: most companies simply do not have enough data to generate meaningful

recommendations in this way. Nowadays even a small bookseller can in principle

offer the same millions of topics as Amazon - so ITI CF would only generate

recommendations for a small fraction of its topics. More sophisticated methods,

like content-based recommendations or, better still, the hierarchical approach

described in Chap. 6 , are needed to resolve this problem. Moreover, the rapidly

accelerating pace of the Internet world, with its constantly changing products,

prices, ratings, competitors, and business models, is making realtime-capable

recommendation systems indispensable.

So the transition to more complex recommendation methods is unavoidable.

That does not mean, however, that all steps have to be perfect and mathematically

proven; practice has every right to rush on ahead of theory. This may seem like a

contradiction of the methodology we described earlier, but it isn't. If we look at

shell theory in mechanics, for example, it is still not always capable of the rigorous

numerical calculation of the deformation of even simple bodies like a cylinder.

Yet supercomputers can successfully simulate the deformation of an entire car in

crash situations. Even if theoretically it is not entirely rigorous, should scientists

wait for another 100 years until shell theory is sufficiently mature before

performing crash simulations? Should thousands more people be allowed to lose

Realtime Data Mining

Search WWH ::

Custom Search

Home