Database Reference
In-Depth Information
return false;
}
else
return false;
}
catch(Exception e)
{
throw new IOException
("Caught exception processing
input row ", e);
}
}
}
It extends the FilterFunc class and includes an exec function that checks
to confirm whether the tuple passed in is not null and makes sure that it has
only one member. It then confirms whether it is an integer and returns true
if it is greater than zero; otherwise, it returns false.
Some other UDF types are the aggregation, load, and store functions. The
functions shown here are the bare-bones implementations. You also need to
consider error handling, progress reporting, and output schema typing. For
more information on custom UDF creation, consult the UDF manual on the
Apache Pig wiki ( http://wiki.apache.org/pig/UDFManual ).
Using Hive
Another tool available to create and run map-reduce jobs in Hadoop is Hive.
One of the major advantages of Hive is that it creates a relational database
layer over the data files. Using this paradigm, you can work with the data
using traditional querying techniques, which is very beneficial if you have a
SQL background. In addition, you do not have to worry about how the query
is translated into themap-reduce job. There is a query engine that works out
the details of what is the most efficient way of loading and aggregating the
data.
In the following sections you will gain an understanding of how to perform
advanced data analysis with Hive. First you will look at the different types
of built-in Hive functions available. Next, you will see how to extend Hive
Search WWH ::




Custom Search