Databases Reference
In-Depth Information
8.3.1 Cloud Data Access Scheme
Data access schemes for a cloud infrastructure perform some important
tasks, i.e., to administer a distribution of data across different networks and
to provide data services for remote clients. In this section, we are going to
discuss a cloud data access scheme using Google's MapReduce technique.
Google's MapReduce is a programming model intended for large-scale data
processing in a massively parallel manner. It was developed to solve issues in-
volving parallelization of computational processes and data distribution across
heterogeneous networks. The MapReduce implementation also addresses load
balancing, network performance, and fault tolerance issues [95].
The MapReduce programming model was inspired by other primitive lan-
guages, such as Lisp. It involves two functions: map and reduce. The map
function is written by users and takes an input pair and produces a set of
intermediate key/value pairs. Intermediate values associated with the same
intermediate key are grouped by the MapReduce library and passed to the
reduce function. The reduce function, also written by the user, merges all the
intermediate values to form a possibly smaller set of values. Typically each
invocation of the reduce function produces zero or one output.
Consider the following examples of map and reduce functions. Given a mul-
tiplication operation in a function f (z), the following procedures illustrate
both the map and reduce applications:
f (z) = map (×2, (2, 4, 6)) → ((2×2) , (4×2) , (6×2)) = (4, 8, 12)
f (z) = reduce (×, (2, 4, 6)) → ((2×4)×6) = 48
Note that the map function is able to run the operation in parallel for all
the inputs, whereas the reduce function works sequentially from left to right.
In the data access mechanism, the map and reduce functions are used to
retrieve data from a collection of distributed repositories. The map function
extracts the desired information based on a condition set by the user (it could
be the condition within an SQL query). It works on the atomic level of data
(a tuple or a file). The reduce function performs an operation on the data
retrieved by the map function and obtains a set of values or a single value, as
required by the user.
An important feature of MapReduce is its ability to parallelize the opera-
tions by working on each individual data and performing these tasks on-site.
Consider the following example. Suppose there is a set of data related to em-
ployees' personal details, as shown in Table 8.3. An SQL query is performed to
retrieve the average salary per department for executive employees as follows:
With this SQL query, MapReduce will conduct the map operation to obtain
the name and salary amount of each employee in a department. Consequently,
the reduce function will calculate the average salary according to each depart-
ment. Figure 8.9 shows these operations.
Some problems arise in this type of processing configuration. For example,
the map function conducts its operation assuming that data are distributed
Search WWH ::




Custom Search