Expanding Your Capability with HBase and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 7.9 Viewing partition information.

You can create partitions on multiple columns, which results in a separate

data directory for each distinct combination of values from the partition

columns. For example, it may be beneficial to partition the flight data table

by both the flight date and airport code depending on the amount of data

and the types of queries. Another option is to further bucket the partitions

using Clustered By and order the data in the buckets with a Sorted By

command. The following statement creates a flight data table partitioned by

flight date, bucketed by airport code, and sorted by departure time:

Create Table flightData_Bucketed (airline_cd

int,airport_cd string,

delay int,dep_time int) Partitioned By(flight_date

string)

Clustered By(airport_cd) Sorted By(dep_time) into 25

buckets;

When loading data into a partitioned table, it is up to you to ensure that the

data is loaded into the right partition. You can use Hive to load the tables

directly from a file. For example, the following statement is used to load a

file containing daily flight data into a table partitioned by date:

LOAD DATA INPATH '/flight/data/

flightdata_2013-01-01.txt'

INTO TABLE flightdata PARTITION(date='2013-01-01')

You can also use Hive to load data from one table into another table. For

example, you may want to load data from a staging table into the partitioned

tables, as illustrated by the following statement:

Search WWH ::

Custom Search

Home