Database Reference
In-Depth Information
3. Copy the JAR file to cluster.
4. Run the analytic.
5. Find a bug, go back and write some more code.
As you can imagine, this process can be time-consuming, and tinkering with the code can
disrupt thinking about the business problem. Fortunately, a robust ecosystem of tools to work
with Hadoop and MapReduce have emerged to simplify this process and allow your analysts
to spend more time thinking about the business problem at hand.
As you'll see, these tools generally do a few things:
▪ Provide a simpler, more familiar interface to MapReduce
▪ Generate immediate feedback by allowing users to build queries interactively
▪ Simplify complex operations
Analytic Libraries
While there is much analysis that can be done in MapReduce or Pig, there are some machine-
learning algorithms that are distributed as part of Apache Mahout project. Some examples of
the kinds of problems suited for Mahout are classification, recommendation, and clustering.
You point machine-learning algorithms at a dataset, and they “learn” something from the
data. They fall into two classes: supervised and unsupervised. In supervised learning, the
data typically has a set of observations and an outcome value. For example, clinical data
about patients would be the observations, and an outcome value might be the presence of a
disease. A supervised-learning algorithm, given a new patient's clinical data, would try to
predict the presence of a disease. Unsupervised algorithms do not use a given outcome, and
instead attempt to find some hidden pattern in the data. For example, we could take a set of
observations of clinical data from patients and try to see if they tend to cluster, so that points
inside a cluster would be “close” to one another and the cluster centers would be far from
one another. The interpretation of the cluster is not given by the algorithm and is left for the
data analyst to discover. You can find the list of supported algorithms on the Mahout home
page .
Recommendation algorithms determine the following: based on other people's ratings, and
the similarity of them to you, what would you be likely to rate highly?
Search WWH ::




Custom Search