Database Reference
In-Depth Information
Blur
License
Apache License, Version 2.0
Activity
Medium
Purpose
Document Warehouse
Official Page
https://incubator.apache.org/blur
Hadoop Integration Fully Integrated
Let's say you've bought in to the entire big data story using Hadoop. You've got Flume gath-
ering data and pushing it into HDFS, your MapReduce jobs are transforming that data and
building key-value pairs that are pushed into HBase, and you even have a couple enterprising
data scientists using Mahout to analyze your data. At this point, your CTO walks up to you
and asks how often one of your specific products is mentioned in a feedback form your are
collecting from your users. Your heart drops as you realize the feedback is free-form text and
you've got no way to search any of that data.
Blur is a tool for indexing and searching text with Hadoop. Because it has Lucene (a very
popular text-indexing framework) at its core, it has many useful features, including fuzzy
matching, wildcard searches, and paged results. It allows you to search through unstructured
data in a way that would otherwise be very difficult.
Tutorial Links
You can't go wrong with the official “getting started” guide on the project home page . There
is also an excellent, though slightly out of date, presentation from a Hadoop User Group
meeting in 2011.
 
Search WWH ::




Custom Search