Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

Web applications are less query oriented but more user concurrency oriented.

Millions of users log into the system across all time zones. While the transactions they

generate may not be volume intensive, the frequency of their transactions is very high:

like Tweeting, Facebook status updates, photo uploads/downloads, music sharing,

etc. The sheer number of user concurrency and the frequency of transactions impose

significant challenges to database processing.

Computation Intensiveness: Computation intensiveness could mean two things:

the complexity of the algorithm, or the complexity of the dataset. Running complex

algorithms over moderately complex data sets can be a performance challenge. On

the other hand, simple algorithms running over large data sets can also cause severe

performance issues.

There is no hard and fast definition of what constitutes a complex computation.

However, we can say that they typically involve transaction-level data, usually consisting

of multiple business rules requiring multiple joins, unpredictable queries, often forced

to resort to full table scans. Perhaps a reasonable definition would be that a complex

computation always involves multiple set operations. That is, you make a selection and

then based on the result of that selection, you go on to make further selections. In other

words, complexity involves recursive set operations.

In non-technical terms, complex computations often involve a requirement to

analyze and compare different data sets. Some typical complex queries are as follows:

•

“To what extent has our new service cannibalized existing

products?” - That is, which customers are using the new service

instead of the old ones, rather than as an addition.

•

“List the top 10 percent of customers most likely to respond to our

new marketing campaign.”

•

“What aspects of a bill are most likely to lead to customer

defection?”

•

“Are employees more likely to be sick when they are overdue for a

holiday?”

•

“Which promotions shorten sales cycles the most?”

Consider just the question about the top 10 percent of customers. In order to answer

this question we need to analyze previous marketing campaigns, understand which

customers responded (which is not easy in itself: it often means a time-lapsed comparison

between the campaign and subsequent purchases), and identify common characteristics

shared by those customers. We then need to search for recipients of the campaign that

share those characteristics and rank them (to do this correctly may require significant

input) according to the closeness of their match to the identified characteristics.

It is unlikely that anyone would question the premise that this is a complex

computation. You could answer it using a conventional relational database, but it would

be time consuming and slow. Now, extend the use case scenario to include all the

multi-channel users, and the web scale itself will throw millions of customers into the mix.

Another aspect of complexity is the predicate . These are selection criteria such

as those based on business rules applied to certain key attributes in a data set. The

predicates put considerable pressure on performance and the efficiency of the database.

Big Data Imperatives

Search WWH ::

Custom Search

Home