Databases Reference
In-Depth Information
Table 1.2 The key case studies associated with the NoSQL movement—the name of the case study/
standard, the business drivers, and the results (findings) of the selected solutions
(continued)
Case study/standard
Driver
Finding
Google's Bigtable
Need to flexibly store tabular
data in a distributed system.
By using a sparse matrix approach,
users can think of all data as being
stored in a single table with billions of
rows and millions of columns without the
need for up-front data modeling.
Amazon's Dynamo
Need to accept a web order 24
hours a day, 7 days a week.
A key-value store with a simple interface
can be replicated even when there are
large volumes of data to be processed.
MarkLogic
Need to query large collections
of XML documents stored on
commodity hardware using stan-
dard query languages.
By distributing queries to commodity
servers that contain indexes of XML doc-
uments, each server can be responsible
for processing data in its own local disk
and returning the results to a query
server.
1.3.1
Case study: LiveJournal's Memcache
Engineers working on the blogging system LiveJournal started to look at how their sys-
tems were using their most precious resource: the
RAM
in each web server. Live-
Journal had a problem. Their website was so popular that the number of visitors using
the site continued to increase on a daily basis. The only way they could keep up with
demand was to continue to add more web servers, each with its own separate
RAM
.
To improve performance, the LiveJournal engineers found ways to keep the results
of the most frequently used database queries in
RAM
, avoiding the expensive cost of
rerunning the same
SQL
queries on their database. But each web server had its own
copy of the query in
RAM
; there was no way for any web server to know that the server
next to it in the rack already had a copy of the query sitting in
RAM
.
So the engineers at LiveJournal created a simple way to create a distinct “signa-
ture” of every
SQL
query. This signature or
hash
was a short string that represented a
SQL
SELECT
statement. By sending a small message between web servers, any web
server could ask the other servers if they had a copy of the
SQL
result already exe-
cuted. If one did, it would return the results of the query and avoid an expensive
round trip to the already overwhelmed
SQL
database. They called their new system
Memcache because it managed
RAM
memory cache.
Many other software engineers had come across this problem in the past. The con-
cept of large pools of shared-memory servers wasn't new. What was different this time
was that the engineers for LiveJournal went one step further. They not only made this
system work (and work well), they shared their software using an open source license,
and they also standardized the communications protocol between the web front ends
(called the
memcached protocol
). Now anyone who wanted to keep their database from
getting overwhelmed with repetitive queries could use their front end tools.