Big Data Storage - Big Data: Related Technologies, Challenges and Future Prospects

Database Reference

In-Depth Information

weak consistency, eventual consistency, and time axis consistency of both types

should be generally compromised for each other. Eventual consistency means

that all the updating operations will finally be propagated through the system

and all the copies will be eventually consistent beyond a given period of time.

Time axis consistency means that all the copies of a given record will apply the

updating operations following the same order.

CAP Option : The CAP theorem indicates that a shared data system may achieve

at most two properties, among consistency, availability, and partition tolerance.

Databases based on cloud computing needs to copy data from different servers

in order to handle system failure in some regions, which basically requires

consistency and availability. This way, the trade-off between consistency and

availability can be determined. At present, various weak consistency models [ 24 ]

have been proposed to achieve reasonable system availability.

4.3.3

Database Programming Model

The massive datasets of big data are generally stored in hundreds and even

thousands of commercial servers. Apparently, the traditional parallel models (e.g.,

Message Passing Interface (MPI) and Open Multi-Processing (OpenMP)) may not

be adequate to support such large-scale parallel programs.

Some parallel programming modes have been proposed for specific fields. These

models effectively improve the performance of NoSQL and reduce the performance

gap between relational databases. Therefore, these models have become the corner-

stone for the analysis of massive data.

4.3.3.1

MapReduce

MapReduce [ 25 ] is a simple but powerful programming model for large-scale

computing using a large number of clusters of commercial PCs to achieve automatic

parallel processing and distribution. In MapReduce, the computational workload

are caused by inputting key-value pair sets and generating key-value pair sets. The

computing model only has two functions, i.e., Map and Reduce, both of which are

programmed by users. The Map function processes input and generates intermediate

key-value pairs. Then, MapReduce will combine all the intermediate values related

to the same key and transmit them to the Reduce function. Next, the Reduce

function receives the intermediate key and its value set, merges them, and generates

a smaller value set. MapReduce has the advantage that it avoids the complicated

steps for developing parallel applications, e.g., data scheduling, fault-tolerance, and

inter-node communications. The user only needs to program the two functions to

develop a parallel application. The initial MapReduce framework did not support

multiple datasets in a task. This shortcoming has been mitigated by some recent

enhancements [ 26 , 27 ].

Search WWH ::

Custom Search

Home