MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Figure 5-1 . ETL based analytics

In 2008, Google's research program published a paper “MapReduce: Simplified

Data Processing on Large Clusters” that introduced the MapReduce paradigm, though

it had been in use since 2003. (You can download this white paper at ht-

search.google.com/en//archive/mapreduce-osdi04.pdf . ) The The

MapReduce algorithm is a distributed parallel programming model, comprised of the

map and reduce function, and is one solution to business problems that require analyt-

ics over heterogeneous forms of massive amounts of data. Since the Google paper was

published, various MapReduce implementations have been built. One popular and

open-source implementation is Apache Hadoop .

Apache Hadoop

Apache Hadoop is a popular open-source library for distributed computation for large-

scale data sets. It's a large-scale batch processing infrastructure that is primarily meant

to deal with batch analytics over data distributed across hundreds of nodes.

Search WWH ::

Custom Search

Home