Database Reference
In-Depth Information
HADooP AND oTHEr NoNrELATIoNAL SourCES
You may have heard the term Big Data around your organization or in the
industry; it often means someone is referring to unstructured or loosely
organized data that comes in large volumes. These sources come with their
own tools to analyze the data and powerful results can be generated. Caution
should be taken, however, because this data has not been cleansed or orga-
nized, is often quite raw, and should be used for exploratory purposes only
before moving it to a more stable and organized framework for specific or
in-depth analysis. This analysis can be done in Hadoop, but is not typically
for beginners who are more familiar with visual tools.
Map-reduce , the algorithm underlying technologies such as Hadoop, is at its
heart a mechanism to split a processing problem into parts (the map part of
the name), distribute the data among nodes, do the processing (the reduce
part of the name), and then recombine the data. A diagram of this process
sourced from http://code.google.com/p/mapreduce-framework/wiki/
MapReduce is shown in Figure 4-3.
data
compu-
tation
FIguRe 4-3 Map-reduce
 
Search WWH ::




Custom Search