Database Reference
In-Depth Information
of this step depends upon the particular implementation of
MapReduce
which is used, and exact nature of the distributed data. For example,
the data may be distributed over a local cluster of computers (with the
use of an implementation such as
Hadoop
), or it may be geographically
distributed because the data was originally created at that location, and
it is too expensive to move the data around. The latter scenario is much
more likely in the
IoT
framework. Nevertheless, the steps for collect-
ing the intermediate results from the different
Map
steps may depend
upon the specific implementation and scenario in which the
MapReduce
framework is used.
The
Reduce
function is then applied in parallel to each group, which in
turn produces a collection of values in the same domain. Next, we apply
Reduce
(
k
2
,list
(
v
2)) in order to create
list
(
v
3). Typically the Reduce
calls over the different keys are distributed over the different nodes, and
each such call will return one value, though it is possible for the call to
return more than one value. In the previous example, the input to
Reduce
will be a list of the form (
Y ear,
[
local max
1
,local max
2
,...local maxr
]),
where the local maximum values are determined by the execution of the
different
Map
functions. The
Reduce
function will then determine the
maximum value over the corresponding list in each call of the
Reduce
function.
The
MapReduce
framework is very powerful in terms of enabling dis-
tributed search and indexing capabilities across the semantic web. An
overview paper in this direction [77] explores the various data processing
capabilities of
MapReduce
used by
Yahoo
! for enabling e
cient search
and indexing. The
MapReduce
framework has also been used for dis-
tributed reasoning across the semantic web [104, 105]. The work in
[105] addresses the issue of semantic web compression with the use of
the
MapReduce
framework. The work is based on the fact that since the
number of RDF statements are rapidly increasing over time (because of
a corresponding increase in the number of “things”), the compression of
these strings would be useful for storage and retrieval. One of the most
often used techniques for compressing data is called
dictionary encod-
ing
. It has been experimentally estimated that the statements on the
semantic web require about 150-210 bytes. If this text is replaced with
8 byte numbers, the same statement requires only 24 bytes, which is a
significant saving. The work in [105] presents methods for performing
this compression with the use of the
MapReduce
framework. Methods for
computing the closure of the RDF graph with the use of the
MapReduce
framework are proposed in [104].
The
Hadoop
implementation of the
MapReduce
framework is an open
source implementation provided by
Apache
. This framework implements