Querying NoSQL Stores - Professional NoSQL - page 129

Databases Reference

In-Depth Information

...

... return { average:(sum/count) };

... };

> var average_rating_per_movie = db.ratings.mapReduce(map, reduce);

> db[average_rating_per_movie.result].find();

movielens_queries.txt

MapReduce allows you to write many types of sophisticated aggregation algorithms, some of which

were presented in this section. A few others are introduced later in the topic.

By now you have had a chance to understand many ways of querying MongoDB collections. Next,

you get a chance to familiarize yourself with querying tabular databases. HBase is used to illustrate

the querying mechanism.

ACCESSING DATA FROM COLUMN-ORIENTED

DATABASES LIKE HBASE

Before you get into querying an HBase data store, you need to store some data in it. As with

MongoDB, you have already had a fi rst taste of storing and accessing data in HBase and its

underlying fi lesystem, which often defaults to Hadoop Distributed FileSystem (HDFS). You are

also aware of HBase and Hadoop basics. This section builds on that basic familiarity. As a working

example, historical daily stock market data from NYSE since the 1970s until February 2010 is loaded

into an HBase instance. This loaded data set is accessed using an HBase-style querying mechanism.

The historical market data is collated from original sources by Infochimp.org and can be accessed at

www.infochimps.com/datasets/nyse-daily-1970-2010-open-close-high-low-and-volume .

The Historical Daily Market Data

The zipped-up download of the entire data set is substantial at 199 MB but very small by HDFS and

HBase standards. The HBase and Hadoop infrastructures are capable of and often used for dealing

with petabytes of data that span multiple physical machines. I chose an easily manageable data set

for the example as I intentionally want to avoid getting distracted by the immensity of preparing and

loading up a large data set for now. This chapter is about the query methods in NoSQL stores

and the focus in this section is on column-oriented databases. Understanding data access in smaller

data sets is more manageable and the concepts apply equally well to larger amounts of data.

The data fi elds are partitioned logically into three different types:

Combination of exchange, stock symbol, and date served as the unique id

The open, high, low, close, and adjusted close are a measure of price

➤

➤

The daily volume

The row-key can be created using a combination of the exchange, stock symbol, and date. So

NYSE,AA,2008-02-27 could be structured as NYSEAA20080227 to be a row-key for the data. All

price-related information can be stored in a column-family named price and volume data can be

stored in a column-family named volume .

➤

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home