Database Reference
In-Depth Information
Tutorial Links
The GitHub page for the Parquet format project is a great place to start if you're interested in
learning a bit more about how the technology works. If, on the other hand, you'd like to dive
straight into examples, move over to the GitHub page for the parquet m/r project .
Example Code
The Parquet file format is supported by many of the standard Hadoop tools, including Hive
(described here ) and Pig (described here ) . Using the Parquet data format is typically as easy
as adding a couple lines to your CREATE TABLE command or changing a few words in your
Pig script.
For example, to change our Hive example to use Parquet instead of the delimited textfile
format, we simply refer to Parquet when we create the table:
CREATE EXTERNAL TABLE movie_reviews
( reviewer STRING, title STRING, rating INT)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
LOCATION '/data/reviews';
We can similarly modify our Pig example to load a review file that is stored in the Parquet
format instead of CSV:
reviews = load 'reviews.pqt' using parquet.pig.ParquetLoader
as (reviewer:chararray, title:chararray, rating:int);
Search WWH ::




Custom Search