Databases Reference
In-Depth Information
MAPREDUCE POSSIBILITIES AND APACHE MAHOUT
MapReduce can be used to solve a number of problems. Google, Yahoo!, Facebook, and many other
organizations are using MapReduce for a diverse range of use cases including distributed sort, web
link graph traversal, log fi le statistics, document clustering, and machine learning. In addition, the
variety of use cases where MapReduce is commonly applied continues to grow.
An open-source project, Apache Mahout, aims to build a complete set of scalable machine learning
and data mining libraries by leveraging MapReduce within the Hadoop infrastructure. I introduce
that project and cover a couple of examples from the project in this section. The motivation for
covering Mahout is to jump-start your quest to explore MapReduce further. I am hoping the
inspiration will help you apply MapReduce effectively for your specifi c and unique use case.
To get started, go to mahout.apache.org and download the latest release or the source distribution.
The project is continuously evolving and rapidly adding features, so it makes sense to grab the
source distribution to build it. The only tools you need, apart from the JDK, are an SVN client to
download the source and Maven version 3.0.2 or higher, to build and install it.
Get the source as follows:
svn co http://svn.apache.org/repos/asf/mahout/trunk
Then change into the downloaded “trunk” source directory and run the following commands to
compile and install Apache Mahout:
mvn compile
mvn install
You may also want to get hold of the Mahout examples as follows:
cd examples
mvn compile
Mahout comes with a taste-web recommender example application. You can change to the taste-
web directory and run the mvn package to get the application compiled and running.
Although Mahout is a new project it contains implementations for clustering, categorization,
collaborative fi ltering, and evolutionary programming. Explaining what these machine learning
topics mean is beyond the scope of this topic but I will walk through an elementary example to
show Mahout in use.
Mahout includes a recommendation engine library, named Taste. This library can be used to
quickly build systems that can have user-based and item-based recommendations. The system uses
collaborative fi ltering.
Taste has fi ve main parts, namely:
DataModel — Model abstraction for storing Users, Items, and Preferences.
UserSimilarity — Interface to defi ne the similarity between two users.
Search WWH ::




Custom Search