Database Reference
In-Depth Information
Problems in the RDBMS world
RDBMS is a great approach. It keeps data consistent, it's good for OLTP ( ht-
tp://en.wikipedia.org/wiki/Online_transaction_processing ), it provides access to good
grammar, and manipulates data supported by all the popular programming languages. It has
been tremendously successful in the last 40 years (the relational data model was proposed
in its first incarnation by Codd, E.F. (1970) in his research paper A Relational Model of
Data for Large Shared Data Banks ). However, in early 2000s, big companies such as
Google and Amazon, which have a gigantic load on their databases to serve, started to feel
bottlenecked with RDBMS, even with helper services such as Memcache on top of them.
As a response to this, Google came up with BigTable ( http://research.google.com/archive/
bigtable.html ) , and Amazon with Dynamo ( http://www.cs.ucsb.edu/~agrawal/fall2009/dy-
namo.pdf ) .
If you have ever used RDBMS for a complicated web application, you must have faced
problems such as slow queries due to complex joins, expensive vertical scaling, and prob-
lems in horizontal scaling. Due to these problems, indexing takes a long time. At some
point, you may have chosen to replicate the database, but there was still some locking, and
this hurts the availability of the system. This means that under a heavy load, locking will
cause the user's experience to deteriorate.
Although replication gives some relief, a busy slave may not catch up with the master (or
there may be a connectivity glitch between the master and the slave). Consistency of such
systems cannot be guaranteed. Consistency, the property of a database to remain in a con-
sistent state before and after a transaction, is one of the promises made by relational data-
bases. It seems that one may need to make compromises on consistency in a relational data-
base for the sake of scalability. With the growth of the application, the demand to scale the
backend becomes more pressing, and the developer teams may decide to add a caching lay-
er (such as Memcached) at the top of the database. This will alleviate some load off the
database, but now the developers will need to maintain the object states in two places: the
database, and the caching layer. Although some Object Relational Mappers ( ORMs )
provide a built-in caching mechanism, they have their own issues, such as larger memory
requirement, and often mapping code pollutes application code. In order to achieve more
from RDBMS, we will need to start to denormalize the database to avoid joins, and keep
the aggregates in the columns to avoid statistical queries.
Sharding or horizontal scaling is another way to distribute the load. Sharding in itself is a
good idea, but it adds too much manual work, plus the knowledge of sharding creeps into
Search WWH ::




Custom Search