Database Reference
In-Depth Information
CqlOutputFormat and CqlInputFormat
CqlOutputFormat
and
CqlInputFormats
are Hadoop-specific output and input
formats for reducer and mapper tasks, respectively. Functioning similar to
ColumnFam-
ilyOutputFormat
and
ColumnFamilyInputFormat
, they provide the ability to
access CQL rows and variable binding.
CqlInputFormat
requires the keyspace and table name to be specified. You can use
ConfigHelper
class to set up this and other configurations. A couple of things that you
should set are input split size via
ConfigHelper.setInputSplitSize
, which de-
faults to 64,000 rows. The number of CQL rows per page via
CqlConfigHelp-
er.setInputCqlPageRowSize
defaults to 1,000 rows per page. It is a good idea to
have CQL rows per page as big as your machine can support without causing memory is-
sues. This will help reducing network overhead. Initial input address and partitioner may be
required to mention. To do so, use the
ConfigHelper.setInputInitialAddress
and
ConfigHelper.setInputPartitioner
methods.
The
CqlOutputFormat
allows the reducer task to write keys and values to the specified
CQL table. You need to set the output table, output initial address, and output partitioner
via
ConfigHelper
and the CQL that updates the output table via
CqlConfigHelp-
er.setOutputCql
.