Database Reference
In-Depth Information
an example of creating a copy of a table in Parquet format using CREATE TABLE...AS
SELECT (see CREATE TABLE...AS SELECT ):
CREATE TABLE users_parquet STORED AS PARQUET
AS
SELECT * FROM users;
Using a custom SerDe: RegexSerDe
Let's see how to use a custom SerDe for loading data. We'll use a contrib SerDe that uses
a regular expression for reading the fixed-width station metadata from a text file:
CREATE TABLE stations (usaf STRING, wban STRING, name STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(\\d{6}) (\\d{5}) (.{29}) .*"
);
In previous examples, we have used the DELIMITED keyword to refer to delimited text
in the ROW FORMAT clause. In this example, we instead specify a SerDe with the SERDE
keyword and the fully qualified classname of the Java class that implements the SerDe,
org.apache.hadoop.hive.contrib.serde2.RegexSerDe .
SerDes can be configured with extra properties using the WITH SERDEPROPERTIES
clause. Here we set the input.regex property, which is specific to RegexSerDe .
input.regex is the regular expression pattern to be used during deserialization to turn
the line of text forming the row into a set of columns. Java regular expression syntax is
used for the matching, and columns are formed from capturing groups of parentheses. [ 115 ]
In this example, there are three capturing groups for usaf (a six-digit identifier), wban (a
five-digit identifier), and name (a fixed-width column of 29 characters).
To populate the table, we use a LOAD DATA statement as before:
LOAD DATA LOCAL INPATH "input/ncdc/metadata/stations-fixed-width.txt"
INTO TABLE stations;
Recall that LOAD DATA copies or moves the files to Hive's warehouse directory (in this
case, it's a copy because the source is the local filesystem). The table's SerDe is not used
for the load operation.
When we retrieve data from the table the SerDe is invoked for deserialization, as we can
see from this simple query, which correctly parses the fields for each row:
Search WWH ::




Custom Search