Big Data Analytics - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Introduction to Mahout

The preceding section introduced you at a high level to both data mining

and predictive analytics and how they apply to big data. If at this point you

are worried that you don't possess the skills or background to successfully

build and deliver this type of intelligence within your HDInsight platform,

fear not!

The remainder of this chapter introduces you to the Mahout machine

learning library and explains how you can use it to deliver meaningful big

data analytical solutions without a PhD in statistics or mathematics. So,

what is this Mahout thing?

Mahout is an open source, top-level Apache project that encapsulates

multiple machine learning algorithms into a single library. Like its Hadoop

counterpart, the Mahout community is a vibrant and active community that

has continually expanded and improved on Mahout.

For a historical perspective, the Mahout project grew out of two separate

projects: the Apache Lucene (an open source text indexing project) and

Taste (an open source Java library of machine learning algorithms).

Mahout supports two basic implementations. First, is a non-distributed

or real-time implementation that involves native non-Hadoop Java calls

directly totheMahoutlibrary. Thesecondscenario istheonewearefocused

on and is accomplished in a distributed or batch processing manner using

Hadoop. Both of these scenarios abstracts away the complexity of machine

learning algorithms.

The basis of Mahout within the context of big data are four primary use

cases:

• Collaborative filtering (recommendation mining based on user

behavior)

• Clustering (grouping similar documents)

• Classification (assigning uncategorized documents to predefined

categories)

• Frequent item set mining (market basket analysis)

To get started with the Apache Mahout library, you first need to download

the project distribution; it is not included by default with HDInsight on

Search WWH ::

Custom Search

Home