Database Reference
In-Depth Information
complex assumptions, etc. For example, initially we will only calculate recommen-
dations based on the current product and only optimize them in a single step. Later
on we can then discard the second and ultimately also the first requirement, by
extending the method accordingly.
A good illustration of this is the discussion about infinity. Philosophers blustered
about the meaning of infinity for centuries, but it was scientists in the eighteenth
century working on the specific task of infinitesimal calculus who reduced the
concept of infinity to epsilon estimations. Suddenly infinity was easy to understand
and merely an abstraction. This viewpoint had become generally established when
at the end of the nineteenth century, while working on his continuum theory, the
German mathematician Georg Cantor dropped the bombshell that infinity does in
fact exist and can even be used in calculations. After much debate, this ultimately
led to a greater understanding of the concept of infinity, which then found expres-
sion in philosophy too.
Conversely, however, it is often argued that complex data mining algorithms are
not worthwhile because they are difficult to master. It is better, so the argument
goes, to use a simple algorithm and to provide large data sets. A classic example of
this is Google, which successfully uses a relatively simple search algorithm on vast
data sets. There is also an example of this in the area of recommendation engines:
Amazon's item-to-item collaborative filtering (ITI CF). Quite simple in mathemat-
ical terms, it has displaced the previously used collaborative filtering, which was
very complex and poorly scaled.
Although this view seems perfectly pragmatic, and in the cases described here
has been successful too, it is nevertheless shortsighted. Generally speaking, one
could argue that people would still be living in caves if they had followed this way
of thinking. But there are also some very specific reasons for not adopting this
approach: most companies simply do not have enough data to generate meaningful
recommendations in this way. Nowadays even a small bookseller can in principle
offer the same millions of topics as Amazon - so ITI CF would only generate
recommendations for a small fraction of its topics. More sophisticated methods,
like content-based recommendations or, better still, the hierarchical approach
described in Chap. 6 , are needed to resolve this problem. Moreover, the rapidly
accelerating pace of the Internet world, with its constantly changing products,
prices, ratings, competitors, and business models, is making realtime-capable
recommendation systems indispensable.
So the transition to more complex recommendation methods is unavoidable.
That does not mean, however, that all steps have to be perfect and mathematically
proven; practice has every right to rush on ahead of theory. This may seem like a
contradiction of the methodology we described earlier, but it isn't. If we look at
shell theory in mechanics, for example, it is still not always capable of the rigorous
numerical calculation of the deformation of even simple bodies like a cylinder.
Yet supercomputers can successfully simulate the deformation of an entire car in
crash situations. Even if theoretically it is not entirely rigorous, should scientists
wait for another 100 years until shell theory is sufficiently mature before
performing crash simulations? Should thousands more people be allowed to lose
Search WWH ::




Custom Search