Databases Reference
In-Depth Information
DB
Geo 3
DB
Geo 1
Primary
DB
DB
Geo 2
DB
Geo 4
FIGURE 4.1
Distributed data processing in the RDBMS.
include MongoDB, Neo4J, Riak, Amazon DynamoDB, MemcachedDB, BerkleyDB, Voldemort, and
many more. Though many of these platforms were originally developed and deployed for solving the
data processing needs of web applications and search engines, they have evolved to support other data
processing requirements. In the rest of this chapter, the intent is to provide you with how data processing
is managed by these platforms. This chapter is not a tutorial for step-by-step configuration and usage of
these technologies. There are also references provided at the end for further reading and reference.
Distributed data processing
Before we proceed to understand how Big Data technologies work and see associated reference archi-
tectures, let us recap distributed data processing.
Distributed data processing has been in existence since the late 1970s. The primary concept was
to replicate the DBMS in a master-slave configuration and process data across multiple instances
( Figure 4.1 ). Each slave would engage in a two-phase commit with its master in a query processing
situation. Several papers exist on the subject and how its early implementations have been designed,
authored by Dr. Stonebraker 1 , Teradata, University of California at Berkley departments, and others.
Several commercial and early open-source DBMS systems have addressed large-scale data pro-
cessing with distributed data management algorithms, however, they all faced problems in the areas
of concurrency, fault tolerance, supporting multiple redundant copies of data, and distributed process-
ing of programs. A bigger barrier was the cost of infrastructure.
Why did distributed data processing fail to meet the requirements in the relational data process-
ing architecture? It can be called a hit or miss depending on the complexity of the architecture. The
answer to this question lies in multiple dimensions:
Dependency on RDBMS:
ACID (atomicity, consistency, isolation, and durability) compliance for transaction
management
Complex architectures for consistency management
Latencies across the system
1 DeWitt, D. J., & Stonebraker, M. (2008). MapReduce: a major step backwards. The Database Column , ( http://
homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html ) .
 
Search WWH ::




Custom Search