Database Reference
In-Depth Information
14/04/13 17:28:10 INFO crawl.InjectorJob: InjectorJob: starting at 2014-04-13 17:28:10
14/04/13 17:28:10 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: nutch/urls
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432,
built on 11/05/2010 05:32 GMT
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client environment:host.name=hc1r1m2
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_30
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.6.0-
openjdk-1.6.0.0/jre
14/04/13 17:28:11 INFO zookeeper.ZooKeeper: Client
................
14/04/13 17:37:20 INFO mapred.JobClient: Job complete: job_201404131430_0019
14/04/13 17:37:21 INFO mapred.JobClient: Counters: 6
14/04/13 17:37:21 INFO mapred.JobClient: Job Counters
14/04/13 17:37:21 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=54738
14/04/13 17:37:21 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving
slots (ms)=0
14/04/13 17:37:21 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/04/13 17:37:21 INFO mapred.JobClient: Launched map tasks=8
14/04/13 17:37:21 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
You need to monitor all of your logs; that is, you need to monitor the following:
ZooKeeper logs , in this case under /var/log/zookeeper. These allow you to ensure that all
servers are up and running as a quorum.
Hadoop logs , in this case under /usr/local/hadoop/logs. Hadoop and MR must be running
without error so that HBase can use Hadoop.
HBaselogs , in this case under /usr/local/hbase/logs. You make sure that HBase is running and
able to talk to ZooKeeper.
Solr output from the Solr session window. It must be running without error so that it can index
the crawl output.
Nutch output from the crawl session . Any errors will appear in the session window.
Each of the components in this architecture must work for the Nutch crawl to work. If you encounter errors, pay
particular attention to your configuration. For timeout errors in ZooKeeper, try increasing the tickTime and
syncLimit values in your ZooKeeper config files.
Potential Errors
Here are some of the errors that occurred when I tried to use this configuration. They are provided here along with
their reasons and solutions. If you encounter them, go back to the step you missed and correct the error.
Consider the first one:
2014-04-08 19:05:39,334 ERROR
org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.io.IOException: Couldnt start ZK at requested address of 2181, instead
got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this
 
Search WWH ::




Custom Search