Data Migration and Analytics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Figure 6-14 . Shows all Hadoop process are running

Next, you need to set up and run Hive:

1.

First, download the latest jar from ht-

tp://apache.mirrors.tds.net/hive/ .

2.

Extract the tarball in a local folder and set that local folder as

HIVE_HOME .

3.

Set HADOOP_HOME in $HIVE_HOME/bin/hive.sh file.

4.

When this configuration of Hive is complete, we can start the Hive

shell by running $HIVE_HOME/bin/hive.sh

Understanding UDF, UDAF, and UDTF

Hive comes with built-in user-defined functions (UDF), user-defined aggregate func-

tions (UDAF), and user-defined table functions (UDTF). Using the Hive shell we can

fetch a list of available functions and also describe them:

SHOW FUNCTIONS;

DESCRIBE FUNCTION <function_name>;

DESCRIBE FUNCTION EXTENDED <function_name>;

The UDFs built in to Hive include functions like round() , pow() , and rand() .

And there are built-in collection functions such as mapkeys and map_values to re-

turn unordered lists of keys and values respectively. For more about UDFs, refer to

guageManual + UDF#LanguageManualUDF-Built-inFunctions .

Among the built-in UDAFs supported by Hive are functions such as count , min ,

max , and percentile . For a detailed list of supported aggregate functions, you can

also refer to https://cwiki.apache.org/confluence/display/Hive/

LanguageManual + UDF#LanguageManualUDF-Built-inAggreg-

ateFunctions(UDAF) .

Also, there are the table-generating UDTFs that operate over multiple rows and at

the table level. For example, the explode function generates a row for each array ele-

ment. Further details and their usage are available at ht-

Search WWH ::

Custom Search

Home