Database and Data Management - Field Guide to Hadoop

Database Reference

In-Depth Information

Blur

License

Apache License, Version 2.0

Activity

Medium

Purpose

Document Warehouse

Official Page

Hadoop Integration Fully Integrated

Let's say you've bought in to the entire big data story using Hadoop. You've got Flume gath-

ering data and pushing it into HDFS, your MapReduce jobs are transforming that data and

building key-value pairs that are pushed into HBase, and you even have a couple enterprising

data scientists using Mahout to analyze your data. At this point, your CTO walks up to you

and asks how often one of your specific products is mentioned in a feedback form your are

collecting from your users. Your heart drops as you realize the feedback is free-form text and

you've got no way to search any of that data.

Blur is a tool for indexing and searching text with Hadoop. Because it has Lucene (a very

popular text-indexing framework) at its core, it has many useful features, including fuzzy

matching, wildcard searches, and paged results. It allows you to search through unstructured

data in a way that would otherwise be very difficult.

Tutorial Links

You can't go wrong with the official “getting started” guide on the project home page . There

is also an excellent, though slightly out of date, presentation from a Hadoop User Group

meeting in 2011.

Search WWH ::

Custom Search

Home