Scalable Parallel Processing with MapReduce - Professional NoSQL - page 230

Databases Reference

In-Depth Information

MAPREDUCE POSSIBILITIES AND APACHE MAHOUT

MapReduce can be used to solve a number of problems. Google, Yahoo!, Facebook, and many other

organizations are using MapReduce for a diverse range of use cases including distributed sort, web

link graph traversal, log fi le statistics, document clustering, and machine learning. In addition, the

variety of use cases where MapReduce is commonly applied continues to grow.

An open-source project, Apache Mahout, aims to build a complete set of scalable machine learning

and data mining libraries by leveraging MapReduce within the Hadoop infrastructure. I introduce

that project and cover a couple of examples from the project in this section. The motivation for

covering Mahout is to jump-start your quest to explore MapReduce further. I am hoping the

inspiration will help you apply MapReduce effectively for your specifi c and unique use case.

To get started, go to mahout.apache.org and download the latest release or the source distribution.

The project is continuously evolving and rapidly adding features, so it makes sense to grab the

source distribution to build it. The only tools you need, apart from the JDK, are an SVN client to

download the source and Maven version 3.0.2 or higher, to build and install it.

Get the source as follows:

svn co http://svn.apache.org/repos/asf/mahout/trunk

Then change into the downloaded “trunk” source directory and run the following commands to

compile and install Apache Mahout:

mvn compile

mvn install

You may also want to get hold of the Mahout examples as follows:

cd examples

mvn compile

Mahout comes with a taste-web recommender example application. You can change to the taste-

web directory and run the mvn package to get the application compiled and running.

Although Mahout is a new project it contains implementations for clustering, categorization,

collaborative fi ltering, and evolutionary programming. Explaining what these machine learning

topics mean is beyond the scope of this topic but I will walk through an elementary example to

show Mahout in use.

Mahout includes a recommendation engine library, named Taste. This library can be used to

quickly build systems that can have user-based and item-based recommendations. The system uses

collaborative fi ltering.

Taste has fi ve main parts, namely:

DataModel — Model abstraction for storing Users, Items, and Preferences.

➤

UserSimilarity — Interface to defi ne the similarity between two users.

➤

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home