Database Reference
In-Depth Information
with custom map-reduce scripts written in Python. Then you will go one
step further and create a UDF to extend the functionality of Hive.
Data Analysis with Hive
One strong point of HiveQL is that it contains a lot of built-in functions
that assist you in your data analysis. There are a number of mathematical,
collection, type conversion, date, and string functions. Most of the functions
that are in the SQL language have been included in HiveQL. For example,
the following HiveQL counts the flights and finds the maximum delay at
each airport from the flightdata table. Figure 9.12 shows the output in
the Hive console:
Select airport_cd, count(*), max(delay)
from flightdata group by airport_cd;
Figure 9.12 Flight counts and maximum delays
Types of Hive Functions
Hive has several flavors of functions you can work with, including the
following:
• UDFs
• UDAFs (user-defined aggregate functions)
• UDTFs (user-defined table-generating functions)
UDFs work on single rows at a time and consist of functions such as type
conversion, math functions, string manipulation, and date/time functions.
 
 
Search WWH ::




Custom Search