Crunch - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Writables . strings ());

PTable < Integer , String > table = pipeline . read ( source );

You can also read Avro datafiles into a PCollection as follows:

Source < WeatherRecord > source =

From . avroFile ( inputPath , Avros . records ( WeatherRecord . class ));

PCollection < WeatherRecord > records = pipeline . read ( source );

Any MapReduce FileInputFormat (in the new MapReduce API) can be used as a

TableSource by means of the formattedFile() method on From , providing

Crunch access to the large number of different Hadoop-supported file formats. There are

also more source implementations in Crunch than the ones exposed in the From class, in-

cluding:

▪ AvroParquetFileSource for reading Parquet files as Avro PType s.

▪ FromHBase , which has a table() method for reading rows from HBase tables

into PTable<ImmutableBytesWritable, Result> collections. Im-

mutableBytesWritable is an HBase class for representing a row key as

bytes, and Result contains the cells from the row scan, which can be configured

to return only cells in particular columns or column families.

Writing to a target

Writing a PCollection to a Target is as simple as calling PCollection 's

write() method with the desired Target . Most commonly, the target is a file, and the

file type can be selected with the static factory methods on the To class. For example, the

following line writes Avro files to a directory called output in the default filesystem:

collection . write ( To . avroFile ( "output" ));

This is just a slightly more convenient way of saying:

pipeline . write ( collection , To . avroFile ( "output" ));

Since the PCollection is being written to an Avro file, it must have a PType belong-

ing to the Avro family, or the pipeline will fail.

The To factory also has methods for creating text files, sequence files, and any MapRe-

duce FileOutputFormat . Crunch also has built-in Target implementations for the

Parquet file format ( AvroParquetFileTarget ) and HBase ( ToHBase ).

Search WWH ::

Custom Search

Home