Reporting with Hadoop - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

the password for that account, plus I know that the Hive version that I am using for Talend is version 2. That means

that the only property I need to determine is the metastore port number. Given that I know all logs will be stored

under /var/log for Cloudera CDH5 servers, I obtain that information as follows:

[hadoop@hc2nn hive]$ pwd

/var/log/hive

[hadoop@hc2nn hive]$ ls -l

total 3828

drwx------ 2 hive hive 4096 Aug 31 12:14 audit

-rw-r--r-- 1 hive hive 2116446 Nov 8 09:58 hadoop-cmf-hive-HIVEMETASTORE-hc2nn.semtech-solutions.

co.nz.log.out

-rw-r--r-- 1 hive hive 1788700 Nov 8 09:58 hadoop-cmf-hive-HIVESERVER2-hc2nn.semtech-solutions.

co.nz.log.out

[hadoop@hc2nn hive]$ grep ThriftCLIService hadoop-cmf-hive-HIVESERVER2-*.log.out | grep listen |

tail -2

2014-11-08 09:49:47,269 INFO org.apache.hive.service.cli.thrift.ThriftCLIService:

ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000

2014-11-08 09:58:58,608 INFO org.apache.hive.service.cli.thrift.ThriftCLIService:

ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000

The first command shows, via a Linux pwd (print working directory) command, that I am in the directory /var/

log/hive. (Note: use the cd command to move to that directory, if necessary.) Then, using the Linux ls command

with the -l option to provide a long listing, I check to see which log files exist in this Hive log directory. Finally, I use

the Linux grep command to search the HIVESERVER2-based log file for the string ThriftCLIService . I pipe ( | ) the

output of this search to another grep command, which searches the ouput further for lines that also contain the text

“listen.” Finally, I limit the output to the last two lines via the Linux tail command with a parameter of -2 . The output

contains the port number that I need at the end of the line. Then, 10000 is the default port number that will be used in

the Talend Hive connection for this section.

So, now I am ready to create a Hive database connection. I can do this by right-clicking the DB Connections

option in the Repository pane. Then, I select Create DB Connection to open a form that offers a two-step process for

creating the connection.

The first section requests the name, purpose, description, and status of the connection. Take care to make the

name meaningful. The second step (shown in Figure 11-19 ) gives the actual connection details. That is, the database

type is set to Hive and the server/port are defined as hc2nn/10000, as previously determined. The Linux account

login for the CentOS host hc2nn is set to hadoop, along with its password. The Hive version is set to Hive2, while

the Hadoop version and instance are set to match the Hadoop cluster being used, Cloudera/CDH5. Finally, the

jdbc string, the Java-based method that Talend will use to connect to Hive, is set to a connection string that uses the

hostname, port, and Hive version.

Search WWH ::

Custom Search

Home