Database Reference
In-Depth Information
Figure 7.9 Viewing partition information.
You can create partitions on multiple columns, which results in a separate
data directory for each distinct combination of values from the partition
columns. For example, it may be beneficial to partition the flight data table
by both the flight date and airport code depending on the amount of data
and the types of queries. Another option is to further bucket the partitions
using Clustered By and order the data in the buckets with a Sorted By
command. The following statement creates a flight data table partitioned by
flight date, bucketed by airport code, and sorted by departure time:
Create Table flightData_Bucketed (airline_cd
int,airport_cd string,
delay int,dep_time int) Partitioned By(flight_date
string)
Clustered By(airport_cd) Sorted By(dep_time) into 25
buckets;
When loading data into a partitioned table, it is up to you to ensure that the
data is loaded into the right partition. You can use Hive to load the tables
directly from a file. For example, the following statement is used to load a
file containing daily flight data into a table partitioned by date:
LOAD DATA INPATH '/flight/data/
flightdata_2013-01-01.txt'
INTO TABLE flightdata PARTITION(date='2013-01-01')
You can also use Hive to load data from one table into another table. For
example, you may want to load data from a staging table into the partitioned
tables, as illustrated by the following statement:
 
Search WWH ::




Custom Search