Database Reference
In-Depth Information
data, aggregate the data, and push the results back to SQL Server where
analysts can use the Microsoft BI toolset to explore the results. Another
pointtoconsideristhatHiveandHCatalogdonotallowdataupdates.When
you load the data, you can either replace or append. This makes loading
largeamountsofdataextremelyfast,butlimitsyourabilitytotrackchanges.
HBase, however, is a key/value data store that allows you to read, write, and
update data. It is designed to allow quick reads of random access data from
large amounts of data based on the key values. It is not designed to provide
fast loading of large data sets, but rather quick updates and inserts of single
sets of data that may be streaming in from a source. It also is not designed
to perform aggregations of the data. It has a query language based on JRuby
that is very unfamiliar to most SQL developers. Having said that, HBase will
beyourtoolofchoiceundersomecircumstances.Suppose,forexample,that
you have a huge store of e-mail messages and you need to occasionally pull
one for auditing. You may also tag the e-mails with identifying fields that
may occasionally need updating. This is an excellent use case for HBase.
If you do need to aggregate and process the data before placing it into a
summary table that needs to be updated, you can always use HBase and
Hive together. You can load and aggregate the data with Hive and push
the results to a table in HBase, where the data summary statistics can be
updated.
Summary
This chapter examined two tools that you can use to create structure on
top of your big data stored in HDFS. HBase is a tool that creates key/
value tuples on top of the data and stores the key values in a columnar
storage structure. Its strength is that it enables fast lookups and supports
consistency when updating the data. The other tool, HCatalog, offers a
relational table abstraction layer over HDFS. Using the HCatalog
abstraction layer allows query tools such as Pig and Hive to treat the data
in a familiar relational architecture. It also permits easier exchange of data
between the HDFS storage and relational databases such as SQL Server and
Oracle.
Search WWH ::




Custom Search