Database Reference
In-Depth Information
FileInputStream input = new
new FileInputStream ( "review.dat" );
review . parseFrom ( input );
Parquet
License
Apache License, Version 2.0
Activity
Medium
Purpose
File Format
Official Page
http://parquet.io
Hadoop Integration API Compatible
One of the most compelling ideas behind an open ecosystem of tools, such as Hadoop, is the
ability to choose the right tool for each specific job. For example, you have a choice between
tools like distcp (described here ) or Flume (described here ) for moving your data into your
cluster; Java MapReduce or Pig for building big data processing jobs; Puppet (described
here ) or Chef (described here ) for managing your cluster; and so on. This choice differs from
many traditional platforms that offer a single tool for each job and provides flexibility at the
cost of complexity.
Parquet is one choice among many for managing the way your data is stored. It is a columnar
data storage format, which means it performs very well with data that is structured and has a
fair amount of repetition. On the other hand, the Parquet format is fairly complex and does
not perform as well in cases where you want to retrieve entire records of data at a time.
Search WWH ::




Custom Search