Database and Data Management - Field Guide to Hadoop

Database Reference

In-Depth Information

a popular framework for indexing and searching documents, and implements that framework

by providing a set of tools for building indexes and querying data.

While Solr is able to use the Hadoop Distributed File System (HDFS; described here ) to

store data, it is not truly compatible with Hadoop and does not use MapReduce (described

here ) or YARN (described here ) to build indexes or respond to queries. There is a similar ef-

fort named Blur (described here ) to build a tool on top of the Lucene framework that lever-

ages the entire Hadoop stack.

Tutorial Links

Apart from the tutorial on the official Solr home page, there is a Solr wiki with great inform-

ation.

Example Code

In this example, we're going to assume we have a set of semi-structured data consisting of

movie reviews with labels that clearly mark the title and the text of the review. These reviews

will be stored in individual JSON files in the reviews directory.

We'll start by telling Solr to index our data; there are a handful of different ways to do this,

all with unique trade-offs. In this case, we're going to use the simplest mechanism, which is

the post.sh script located in the exampledocs/ subdirectory of our Solr install:

./example/exampledocs/post.sh /reviews/*.json

Once our reviews have been indexed, they are ready to search. Solr has its own graphical

user interface (GUI) that can be used for simple searches. We'll pull up that GUI and search

for movie reviews that contain the word “great”:

review_text:great&fl=title

This search tells Solr that we want to retrieve the title field ( fl=title ) for any review

where the word “great” appears in the review_text field.

Search WWH ::

Custom Search

Home