Databases Reference
In-Depth Information
Figure 8-12. Index sharding example
An incoming search query is analyzed for the queried fields. If all fields are indexed,
parallel search queries will be sent to all index shards. The responses will be collected,
sorted, and then returned to the user. If the search fields are not indexed, a new map-reduce
search job will be created and submitted to the Hadoop job tracker.
When performing a query, all index shards are queried in parallel. This ensures
fast response times. When a user formulates a query, it is first analyzed if this query can
be run against the Lucene indexes. This is not the case if the user specifies search fields
which are not indexed. In that case the query will be run as a map-reduce job. If the query
can be run against the Lucene indexes, it will be forwarded to all data nodes in parallel.
The results of these subqueries are collected and sorted. Then the log messages are read
from the HDFS using the primary keys inside the Lucene index results.
If a query to a single shard fails, the search results may be incomplete, but the queries
to the other shards are not affected. This greatly enhances the availability of the system.
Building a Recommendation System
With the number of options available to the users is ever increasing, the attention
span of customers is getting lower and lower. Customers are used to seeing their best
choices right in front of them. In such a scenario, we see recommendations powering
more and more features of the products and driving user interaction. Hence companies
are looking for more ways to minutely target customers at the right time. Some of the
examples of recommendation systems include product recommendations, merchant
recommendations, content recommendations, social recommendations, query
recommendation, display and search ads (Figure 8-13 ).
 
Search WWH ::




Custom Search