Database Reference
In-Depth Information
CqlOutputFormat and CqlInputFormat
CqlOutputFormat and CqlInputFormats are Hadoop-specific output and input
formats for reducer and mapper tasks, respectively. Functioning similar to ColumnFam-
ilyOutputFormat and ColumnFamilyInputFormat , they provide the ability to
access CQL rows and variable binding.
CqlInputFormat requires the keyspace and table name to be specified. You can use
ConfigHelper class to set up this and other configurations. A couple of things that you
should set are input split size via ConfigHelper.setInputSplitSize , which de-
faults to 64,000 rows. The number of CQL rows per page via CqlConfigHelp-
er.setInputCqlPageRowSize defaults to 1,000 rows per page. It is a good idea to
have CQL rows per page as big as your machine can support without causing memory is-
sues. This will help reducing network overhead. Initial input address and partitioner may be
required to mention. To do so, use the ConfigHelper.setInputInitialAddress
and ConfigHelper.setInputPartitioner methods.
The CqlOutputFormat allows the reducer task to write keys and values to the specified
CQL table. You need to set the output table, output initial address, and output partitioner
via ConfigHelper and the CQL that updates the output table via CqlConfigHelp-
er.setOutputCql .
Search WWH ::




Custom Search