Database Reference
In-Depth Information
A quick check of the installation under /usr/local/ shows the solr installation directory as owned by Hadoop and
the solr link pointing to it:
[root@hc1nn local]# ls -ld *solr*
lrwxrwxrwx. 1 root root 10 Mar 29 13:11 solr -> solr-4.7.0
drwxr-xr-x. 7 hadoop hadoop 4096 Feb 22 08:39 solr-4.7.0
At this point you have Solr installed in the correct location, and you are ready to configure it. You can set up a
variable in the hadoop user's Bash shell to point to the Solr installation. Add the following text to the bottom of the
Linux hadoop account configuration file $HOME/.bashrc. This will define the Bash shell environment variable
SOLR_HOME to be /usr/local/solr.
#######################################################
# Set up Solr variables
export SOLR_HOME=/usr/local/solr
Next, configure Solr to integrate it with Nutch. Some of the Nutch configuration files need to be copied to the Solr
configuration directory; copy the files schema.xml and schema-solr4.xml across:
[hadoop@hc1nn ~]$ cd $NUTCH_HOME/conf
[hadoop@hc1nn conf]$ cp schema.xml $SOLR_HOME/example/solr/collection1/conf
[hadoop@hc1nn conf]$ cp schema-solr4.xml $SOLR_HOME/example/solr/collection1/conf
These schema files define the field types and fields that the documents being indexed will contain. Solr uses the
information in the schema files to help it parse and index the data that it processes.
Next, add a few extra fields at the end of the <fields> section of schema.xml:
<!-- fields for Nutch -->
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="text" type="string" indexed="true" stored="true"/>
The filter factory algorithm currently listed in the file is the EnglishPorterFilterFactory , which has been
deprecated. To replace it, you need to specify the SnowballPorterFilterFactory ; originally devised by by
Martin Porter, the algorithm is used as a filter to prepare document tokens before they are processed by Solr.
Look for this line:
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
And replace it with this one:
<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
Now try starting Solr to test that it will work:
[hadoop@hc1nn conf]$ cd $SOLR_HOME/example/
[hadoop@hc1nn example]$ java -jar start.jar &
The & symbol as the end of the line means that the Solr job you are running will run in the background. Look for
any errors in the output that are displayed in the session window.
 
Search WWH ::




Custom Search