Database Reference
In-Depth Information
Figure 5-1 . ETL based analytics
In 2008, Google's research program published a paper “MapReduce: Simplified
Data Processing on Large Clusters” that introduced the MapReduce paradigm, though
it had been in use since 2003. (You can download this white paper at ht-
tp://static.googleusercontent.com/media/re-
search.google.com/en//archive/mapreduce-osdi04.pdf . ) The The
MapReduce algorithm is a distributed parallel programming model, comprised of the
map and reduce function, and is one solution to business problems that require analyt-
ics over heterogeneous forms of massive amounts of data. Since the Google paper was
published, various MapReduce implementations have been built. One popular and
open-source implementation is Apache Hadoop .
Apache Hadoop
Apache Hadoop is a popular open-source library for distributed computation for large-
scale data sets. It's a large-scale batch processing infrastructure that is primarily meant
to deal with batch analytics over data distributed across hundreds of nodes.
 
Search WWH ::




Custom Search