Database Reference
In-Depth Information
Classification algorithms, given a set of observations on an individual, predict some un-
known outcome. If the outcome is a binary variable, logistic regression can be used to predict
the probability of that outcome. For example, given the set of lab results of a patient, predict
the probability that the patient has a given disease. If the outcome is a numeric variable, lin-
ear regression can be used to predict the value of that outcome. For example, given this
month's economic conditions, predict the unemployment rate for next month.
Clustering algorithms don't really answer a question. You frequently use them in the first
stage of your analysis to get a feel for the data.
Data analytics is a deep topic—too deep to discuss in any detail here. O'Reilly has an excel-
lent series of topics on the topic of data analytics.
Most of the analytics just discussed deal with numerical or categorical data. Increasingly im-
portant in the Hadoop world are text analytics and geospatial analytics.
Pig
License
Apache License, Version 2.0
Activity
High
Purpose
High-level data flow language for processing data
Official Page
http://pig.apache.org
Hadoop Integration Fully Integrated
If MapReduce code in Java is the “assembly language” of Hadoop, then Pig is analogous to
Python or another high-level language. Why would you want to use Pig rather than MapRe-
duce? Writing in Pig may not be as performant as writing mappers and reducers in Java, but
Search WWH ::




Custom Search