Analytics with Hadoop - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Next, I install a repository file under /etc/yum.repos.d on hc1r1m1 for Impala, so that the Linux yum command

knows where to find the Cloudera Impala software. The repository file is downloaded from Cloudera's site by using

the Linux wget command:

[root@hc1r1m1 ~]# cd /etc/yum.repos.d

I can examine the contents of this downloaded repository file by using the Linux cat command:

[root@hc1r1m1 yum.repos.d]# cat cloudera-impala.repo

[cloudera-impala]

name=Impala

gpgcheck = 1

Next, I install the Impala components and the Impala shell by using the yum command as the Linux root user:

[root@hc1r1m1 ~]# yum install impala impala-server impala-state-store impala-catalog impala-shell

These commands install the Impala Catalogue server, the Impala server, the Impala State Store server, and the

Impala scripting shell. The Impala server runs on each node in an Impala cluster; it accepts queries and passes data to

and from the files. The Impala scripting shell acts as a client to receive user commands and passes them to the server.

Key to making an Impala cluster robust, the State Store server monitors the state of an Impala cluster and manages the

workload when something goes wrong. The Catalog server manages metadata—that is, data about data—and passes

details about metadata changes to the rest of the cluster.

As soon as the software is installed, it is time to configure it. I copy the Hive hive-site.xml, the HBase hbase-site.xml,

and the Hadoop files core-site.xml and hdfs-site.xml to the Impala configuration area, which I find under/etc/impala/conf.

The dot character ( . ) at the end of the cp (copy)command is just Linux shorthand for the current directory:

[root@hc1r1m1 conf]# cd /etc/impala/conf

[root@hc1r1m1 conf]# cp /etc/hive/conf/hive-site.xml .

[root@hc1r1m1 conf]# cp /etc/hadoop/conf/core-site.xml .

[root@hc1r1m1 conf]# cp /etc/hbase/conf/hbase-site.xml .

[root@hc1r1m1 conf]# cp /etc/hadoop/conf/hdfs-site.xml .

To specify the host and port number for the Hive metastore thrift API, as well as to specify a timeout value for

access, I make the following changes to the hive-site.xml file in the Impala configuration area:

<name>hive.metastore.uris</name>

<value>thrift://hc1r1m1:9083</value>

IP address (or fully-qualified domain name) and port of the metastore host

</description>

</property>

Search WWH ::

Custom Search

Home