Database Reference
In-Depth Information
weak consistency, eventual consistency, and time axis consistency of both types
should be generally compromised for each other. Eventual consistency means
that all the updating operations will finally be propagated through the system
and all the copies will be eventually consistent beyond a given period of time.
Time axis consistency means that all the copies of a given record will apply the
updating operations following the same order.
CAP Option
: The CAP theorem indicates that a shared data system may achieve
at most two properties, among consistency, availability, and partition tolerance.
Databases based on cloud computing needs to copy data from different servers
in order to handle system failure in some regions, which basically requires
consistency and availability. This way, the trade-off between consistency and
availability can be determined. At present, various weak consistency models [
24
]
have been proposed to achieve reasonable system availability.
4.3.3
Database Programming Model
The massive datasets of big data are generally stored in hundreds and even
thousands of commercial servers. Apparently, the traditional parallel models (e.g.,
Message Passing Interface (MPI) and Open Multi-Processing (OpenMP)) may not
be adequate to support such large-scale parallel programs.
Some parallel programming modes have been proposed for specific fields. These
models effectively improve the performance of NoSQL and reduce the performance
gap between relational databases. Therefore, these models have become the corner-
stone for the analysis of massive data.
4.3.3.1
MapReduce
MapReduce [
25
] is a simple but powerful programming model for large-scale
computing using a large number of clusters of commercial PCs to achieve automatic
parallel processing and distribution. In MapReduce, the computational workload
are caused by inputting key-value pair sets and generating key-value pair sets. The
computing model only has two functions, i.e., Map and Reduce, both of which are
programmed by users. The Map function processes input and generates intermediate
key-value pairs. Then, MapReduce will combine all the intermediate values related
to the same key and transmit them to the Reduce function. Next, the Reduce
function receives the intermediate key and its value set, merges them, and generates
a smaller value set. MapReduce has the advantage that it avoids the complicated
steps for developing parallel applications, e.g., data scheduling, fault-tolerance, and
inter-node communications. The user only needs to program the two functions to
develop a parallel application. The initial MapReduce framework did not support
multiple datasets in a task. This shortcoming has been mitigated by some recent
enhancements [
26
,
27
].
Search WWH ::
Custom Search