Serialization - Field Guide to Hadoop

Database Reference

In-Depth Information

FileInputStream input = new

new FileInputStream ( "review.dat" );

review . parseFrom ( input );

Parquet

License

Apache License, Version 2.0

Activity

Medium

Purpose

File Format

Official Page

http://parquet.io

Hadoop Integration API Compatible

One of the most compelling ideas behind an open ecosystem of tools, such as Hadoop, is the

ability to choose the right tool for each specific job. For example, you have a choice between

tools like distcp (described here ) or Flume (described here ) for moving your data into your

cluster; Java MapReduce or Pig for building big data processing jobs; Puppet (described

here ) or Chef (described here ) for managing your cluster; and so on. This choice differs from

many traditional platforms that offer a single tool for each job and provides flexibility at the

cost of complexity.

Parquet is one choice among many for managing the way your data is stored. It is a columnar

data storage format, which means it performs very well with data that is structured and has a

fair amount of repetition. On the other hand, the Parquet format is fairly complex and does

not perform as well in cases where you want to retrieve entire records of data at a time.

Search WWH ::

Custom Search

Home