Expanding Your Capability with HBase and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

From building_stg bs

Insert Overwrite Table building

select bs.building_id, bs.building_age, bs.hvac_type;

14. Now that the tables are loaded, you need to build the indexes:

ALTER INDEX Building_IDX_1 ON sensor REBUILD;

ALTER INDEX Building_IDX_2 ON building REBUILD;

After the index has been built, the data is now ready to analyze.

You can use Hive to query and aggregate the data. The following query

determines the maximum temperature difference between target and

actual temperatures for each day and HVAC type:

select max(s.delta_tmp), s.dt, b.hvac_type

from sensor s join building b on (s.building_id =

b.building_id)

Group By s.dt, b.hvac_type;

In Chapter 11, “Visualizing Big Data with Microsoft BI,” youwill seehowyou

canusetheHiveODBCconnectortoloadandanalyzethedatainMicrosoft's

BI toolset.

Using HBase or Hive as a Data Warehouse

Although both HBase and Hive are both considered data warehouse

structures, they differ significantly as to how they store and query data. Hive

is more like a traditional data warehouse reporting system. It structures the

data in a set of tables that you can join, aggregate, and query on using a

query language (Hive Query Language [HQL]) that is very similar to the

SQL,whichmostdatabasedevelopersarealreadyusedtoworkingwith.This

relieves you from having to write MapReduce code. The downside to Hive

is it can take a long time to process through the data and is not intended to

give clients instant results.

Hive is usually used to run processing through scheduled jobs and then

load the results into a summary type table that can be queried on by client

applications. One of the strengths of Hive and HCatalog is the ability to pass

data between traditional relational databases such as SQL Server. A good

use case for Hive and HCatalog is to load large amounts of unstructured

Search WWH ::

Custom Search

Home