NoSQL data architecture patterns - Making Sense of NoSQL

Databases Reference

In-Depth Information

but in the LOD the data was created by different organizations, so the only way to join

the data is to use consistent URI s to identify nodes.

The number of datasets that participate in the LOD community is large and grow-

ing, but as you might guess, there are few ways to guarantee the quality and consis-

tency of public data. If you find inconsistencies and missing data, there's no easy way

to create bulk updates to correct the source data. This means you may need to manu-

ally edit hundreds of Wiki pages in order to add or correct data. After this is done, you

may need to wait till the next time the pages get indexed by the RDF extraction tools.

These are challenges that have led to the concept of curated datasets that are based

on public data but then undergo a postprocessing cleanup and normalization phase

to make the data more usable by organizations.

In this section, we've covered graph representations and shown how organizations

are using graph stores to solve business problems. We now move on to our third

NoSQL data architecture pattern.

4.3

Column family (Bigtable) stores

As you've seen, key-value stores and graph stores have simple structures that are useful

for solving a variety of business problems. Now let's look at how you can combine a

row and column from a table to use as the key.

Column family systems are important NoSQL data architecture patterns because

they can scale to manage large volumes of data. They're also known to be closely tied

with many MapReduce systems. As you may recall from our discussion of MapReduce

in chapter 2, MapReduce is a framework for performing parallel processing on large

datasets across multiple computers (nodes). In the MapReduce framework, the map

operation has a master node which breaks up an operation into subparts and distrib-

utes each operation to another node for processing, and reduce is the process where

the master node collects the results from the other nodes and combines them into the

answer to the original problem.

Column family stores use row and column identifiers as general purposes keys for

data lookup. They're sometimes referred to as data stores rather than databases , since

they lack features you may expect to find in traditional databases. For example, they

lack typed columns, secondary indexes, triggers, and query languages. Almost all col-

umn family stores have been heavily influenced by the original Google Bigtable paper.

HBase, Hypertable, and Cassandra are good examples of systems that have Bigtable-

like interfaces, although how they're implemented varies.

We should note that the term column family is distinct from a column store . A column-

store database stores all information within a column of a table at the same location on

disk in the same way a row-store keeps row data together. Column stores are used in

many OLAP systems because their strength is rapid column aggregate calculation.

MonetDB , SybaseIQ , and Ver tica are examples of column-store systems. Column-store

databases provide a SQL interface to access their data.

Search WWH ::

Custom Search

Home