Databases Reference
In-Depth Information
Web applications are less query oriented but more user concurrency oriented.
Millions of users log into the system across all time zones. While the transactions they
generate may not be volume intensive, the frequency of their transactions is very high:
like Tweeting, Facebook status updates, photo uploads/downloads, music sharing,
etc. The sheer number of user concurrency and the frequency of transactions impose
significant challenges to database processing.
Computation Intensiveness: Computation intensiveness could mean two things:
the complexity of the algorithm, or the complexity of the dataset. Running complex
algorithms over moderately complex data sets can be a performance challenge. On
the other hand, simple algorithms running over large data sets can also cause severe
performance issues.
There is no hard and fast definition of what constitutes a complex computation.
However, we can say that they typically involve transaction-level data, usually consisting
of multiple business rules requiring multiple joins, unpredictable queries, often forced
to resort to full table scans. Perhaps a reasonable definition would be that a complex
computation always involves multiple set operations. That is, you make a selection and
then based on the result of that selection, you go on to make further selections. In other
words, complexity involves recursive set operations.
In non-technical terms, complex computations often involve a requirement to
analyze and compare different data sets. Some typical complex queries are as follows:
“To what extent has our new service cannibalized existing
products?” - That is, which customers are using the new service
instead of the old ones, rather than as an addition.
“List the top 10 percent of customers most likely to respond to our
new marketing campaign.”
“What aspects of a bill are most likely to lead to customer
defection?”
“Are employees more likely to be sick when they are overdue for a
holiday?”
“Which promotions shorten sales cycles the most?”
Consider just the question about the top 10 percent of customers. In order to answer
this question we need to analyze previous marketing campaigns, understand which
customers responded (which is not easy in itself: it often means a time-lapsed comparison
between the campaign and subsequent purchases), and identify common characteristics
shared by those customers. We then need to search for recipients of the campaign that
share those characteristics and rank them (to do this correctly may require significant
input) according to the closeness of their match to the identified characteristics.
It is unlikely that anyone would question the premise that this is a complex
computation. You could answer it using a conventional relational database, but it would
be time consuming and slow. Now, extend the use case scenario to include all the
multi-channel users, and the web scale itself will throw millions of customers into the mix.
Another aspect of complexity is the predicate . These are selection criteria such
as those based on business rules applied to certain key attributes in a data set. The
predicates put considerable pressure on performance and the efficiency of the database.
 
Search WWH ::




Custom Search