Database Reference
In-Depth Information
Then it checks to determine whether it can access Hadoop:
# check that hadoop can be found on the path
if [ $mode = "distributed" ]; then
if [ $(which hadoop | wc -l ) -eq 0 ]; then
echo "Can't find Hadoop executable. Add HADOOP_HOME/bin to the path or run in local mode."
exit -1;
fi
fi
Given these checks, if Hadoop is available, it will be used for storage; otherwise, the Linux file system will be used.
You can now run the crawl as follows:
cd $NUTCH_HOME/runtime/deploy/bin
./crawl nutch/urls crawl http://hc1nn:8983/solr/ 2
This gives you the Nutch crawl output:
14/04/06 16:56:22 INFO crawl.Injector: Injector: starting at 2014-04-06 16:56:22
14/04/06 16:56:22 INFO crawl.Injector: Injector: crawlDb: /user/hadoop/crawl/crawldb
14/04/06 16:56:22 INFO crawl.Injector: Injector: urlDir: nutch/urls
14/04/06 16:56:22 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
14/04/06 16:56:26 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/06 16:56:26 INFO mapred.FileInputFormat: Total input paths to process : 1
14/04/06 16:56:26 INFO mapred.JobClient: Running job: job_201404061342_0056
14/04/06 16:56:27 INFO mapred.JobClient: map 0% reduce 0%
14/04/06 16:56:43 INFO mapred.JobClient: map 50% reduce 0%
14/04/06 16:56:47 INFO mapred.JobClient: map 100% reduce 0%
14/04/06 16:56:51 INFO mapred.JobClient: map 100% reduce 33%
14/04/06 16:56:52 INFO mapred.JobClient: map 100% reduce 100%
14/04/06 16:56:53 INFO mapred.JobClient: Job complete: job_201404061342_0056
............................
14/04/06 17:05:53 INFO mapred.JobClient: Counters: 30
14/04/06 17:05:53 INFO mapred.JobClient: Job Counters
14/04/06 17:05:53 INFO mapred.JobClient: Launched reduce tasks=1
14/04/06 17:05:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10036
14/04/06 17:05:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving
slots (ms)=0
14/04/06 17:05:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/04/06 17:05:53 INFO mapred.JobClient: Rack-local map tasks=1
14/04/06 17:05:53 INFO mapred.JobClient: Launched map tasks=2
14/04/06 17:05:53 INFO mapred.JobClient: Data-local map tasks=1
14/04/06 17:05:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8334
14/04/06 17:05:53 INFO mapred.JobClient: File Input Format Counters
14/04/06 17:05:53 INFO mapred.JobClient: Bytes Read=3746
14/04/06 17:05:53 INFO mapred.JobClient: File Output Format Counters
14/04/06 17:05:53 INFO mapred.JobClient: Bytes Written=0
14/04/06 17:05:53 INFO mapred.JobClient: FileSystemCounters
14/04/06 17:05:53 INFO mapred.JobClient: FILE_BYTES_READ=6
 
Search WWH ::




Custom Search