Database Reference
In-Depth Information
Figure 6-14
.
Shows all Hadoop process are running
Next, you need to set up and run Hive:
1.
First, download the latest jar from
ht-
2.
Extract the tarball in a local folder and set that local folder as
HIVE_HOME
.
3.
Set
HADOOP_HOME
in
$HIVE_HOME/bin/hive.sh
file.
4.
When this configuration of Hive is complete, we can start the Hive
shell by running
$HIVE_HOME/bin/hive.sh
Understanding UDF, UDAF, and UDTF
Hive comes with built-in user-defined functions (UDF), user-defined aggregate func-
tions (UDAF), and user-defined table functions (UDTF). Using the Hive shell we can
fetch a list of available functions and also describe them:
SHOW FUNCTIONS;
DESCRIBE FUNCTION <function_name>;
DESCRIBE FUNCTION EXTENDED <function_name>;
The UDFs built in to Hive include functions like
round()
,
pow()
, and
rand()
.
And there are built-in collection functions such as
mapkeys
and
map_values
to re-
turn unordered lists of keys and values respectively. For more about UDFs, refer to
guageManual
+
UDF#LanguageManualUDF-Built-inFunctions
.
Among the built-in UDAFs supported by Hive are functions such as
count
,
min
,
max
, and
percentile
. For a detailed list of supported aggregate functions, you can
also refer to
https://cwiki.apache.org/confluence/display/Hive/
LanguageManual
+
UDF#LanguageManualUDF-Built-inAggreg-
ateFunctions(UDAF)
.
Also, there are the table-generating UDTFs that operate over multiple rows and at
the table level. For example, the
explode
function generates a row for each array ele-
ment. Further details and their usage are available at
ht-