Database Reference
In-Depth Information
of this step depends upon the particular implementation of MapReduce
which is used, and exact nature of the distributed data. For example,
the data may be distributed over a local cluster of computers (with the
use of an implementation such as Hadoop ), or it may be geographically
distributed because the data was originally created at that location, and
it is too expensive to move the data around. The latter scenario is much
more likely in the IoT framework. Nevertheless, the steps for collect-
ing the intermediate results from the different Map steps may depend
upon the specific implementation and scenario in which the MapReduce
framework is used.
The Reduce function is then applied in parallel to each group, which in
turn produces a collection of values in the same domain. Next, we apply
Reduce ( k 2 ,list ( v 2)) in order to create list ( v 3). Typically the Reduce
calls over the different keys are distributed over the different nodes, and
each such call will return one value, though it is possible for the call to
return more than one value. In the previous example, the input to Reduce
will be a list of the form ( Y ear, [ local max 1 ,local max 2 ,...local maxr ]),
where the local maximum values are determined by the execution of the
different Map functions. The Reduce function will then determine the
maximum value over the corresponding list in each call of the Reduce
function.
The MapReduce framework is very powerful in terms of enabling dis-
tributed search and indexing capabilities across the semantic web. An
overview paper in this direction [77] explores the various data processing
capabilities of MapReduce used by Yahoo ! for enabling e cient search
and indexing. The MapReduce framework has also been used for dis-
tributed reasoning across the semantic web [104, 105]. The work in
[105] addresses the issue of semantic web compression with the use of
the MapReduce framework. The work is based on the fact that since the
number of RDF statements are rapidly increasing over time (because of
a corresponding increase in the number of “things”), the compression of
these strings would be useful for storage and retrieval. One of the most
often used techniques for compressing data is called dictionary encod-
ing . It has been experimentally estimated that the statements on the
semantic web require about 150-210 bytes. If this text is replaced with
8 byte numbers, the same statement requires only 24 bytes, which is a
significant saving. The work in [105] presents methods for performing
this compression with the use of the MapReduce framework. Methods for
computing the closure of the RDF graph with the use of the MapReduce
framework are proposed in [104].
The Hadoop implementation of the MapReduce framework is an open
source implementation provided by Apache . This framework implements
Search WWH ::




Custom Search