Database Reference
In-Depth Information
HBase seems to be running okay, but you should check that HBase is storing the data to HDFS. To do so, use the
Hadoop file system ls command. HBase should have created a storage directory on HDFS called “/hbase,” so you can
check that:
[hadoop@hc1nn logs]$ hadoop dfs -ls /hbase
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2014-04-12 19:55 /hbase/-ROOT-
drwxr-xr-x - hadoop supergroup 0 2014-04-12 19:55 /hbase/.META.
drwxr-xr-x - hadoop supergroup 0 2014-04-12 19:57 /hbase/.logs
drwxr-xr-x - hadoop supergroup 0 2014-04-12 19:57 /hbase/.oldlogs
-rw-r--r-- 3 hadoop supergroup 3 2014-04-12 19:55 /hbase/hbase.version
Gora Configuration
Nutch and Solr are ready, HBase and ZooKeeper are ready, and HBase is storing its data to HDFS. Now it is time to
connect Nutch to HBase using the Apache Gora module that was installed with Nutch 2.x. Gora ( Gora.apache.org )
provides an in-memory data model for big data and data persistence. It allows you to choose where you will store the
data that Nutch collects, because it supports a variety of data stores. In this section, you will configure Gora to store
Nutch 2.x crawl data to HBase.
You can now set up the Gora connection for Nutch. First, you need to edit the nutch-site.xml file:
[hadoop@hc1r1m2 conf]$ pwd
/usr/local/nutch/conf
[hadoop@hc1r1m2 conf]$ vi nutch-site.xml
Specifically, you add a property called “storage.data.store.class” to specify that HBase will be the default storage
for Nutch Gora. As before, make sure that you add the property to the file so that it sits between the xml open and
close configuration tabs:
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
Check the Nutch Ivy configuration. Apache Ivy ( http://ant.apache.org/ivy/ ) is a dependency manager that is
integrated with Apache Ant. Intended for Java-based systems, it is mostly used for system build management.
[hadoop@hc1r1m2 ivy]$ pwd
/usr/local/nutch/ivy
[hadoop@hc1r1m2 ivy]$ vi ivy.xml
Make sure that this line is uncommented so that Ivy is configured to use Gora. This is what the line looks like after
the change:
<dependency org="org.apache.gora" name="gora-sql" rev="0.3" conf="*->default" />
 
Search WWH ::




Custom Search