Database Reference
In-Depth Information
Then you can check that these links exist by creating a long Linux listing using ls -l:
[hadoop@hc1nn conf]$ ls -l
lrwxrwxrwx. 1 hadoop hadoop 36 Apr 5 14:15 core-site.xml -> /usr/local/hadoop/conf/core-site.xml
lrwxrwxrwx. 1 hadoop hadoop 36 Apr 5 14:16 hadoop-env.sh -> /usr/local/hadoop/conf/hadoop-env.sh
lrwxrwxrwx. 1 hadoop hadoop 36 Apr 5 14:16 hdfs-site.xml -> /usr/local/hadoop/conf/hdfs-site.xml
lrwxrwxrwx. 1 hadoop hadoop 38 Apr 5 14:16 mapred-site.xml -> /usr/local/hadoop/conf/mapred-site.xml
lrwxrwxrwx. 1 hadoop hadoop 38 Apr 5 14:16 masters -> /usr/local/hadoop/conf/masters
lrwxrwxrwx. 1 hadoop hadoop 38 Apr 5 14:16 slaves -> /usr/local/hadoop/conf/slaves
Next, you make some additions to the nutch-site.xml configuration file, as well as to the Hadoop core-site.
xml and mapred-site.xml files. When adding the code snippets, place each new property (identified by the opening
<property> and closing </property> tags) between the configuration tags in the appropriate file. You can find these
files (or links to them) in the Nutch configuration directory /usr/local/nutch/conf.
First, make the nutch-site.xml file changes. These define the name of your Nutch agent and the location of the
plug-ins folders, a source of extra modules:
<configuration>
<property>
<name>http.agent.name</name>
<value>NutchHadoopCrawler</value>
</property>
<property>
<name>plugin.folders</name>
<value>/usr/local/nutch/build/plugins</value>
</property>
</configuration>
Next, make those changes that are for the Hadoop core component (core-site.xml) in the Nutch configuration
directory /usr/local/nutch/conf/ to enable gzip compression with Hadoop:
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.
apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
Place the changes for the Hadoop Map Reduce component in the mapred-site.xml file in the Nutch configuration
directory /usr/local/nutch/conf/. These specify the memory limitations and the maximum attempt limits for both
Map and Reduce tasks, thereby helping to prevent a runaway task in terms of Map Reduce looping or memory use:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
 
Search WWH ::




Custom Search