Performance Optimization - HBase Design Patterns - page 110

Database Reference

In-Depth Information

The following table explains what the parameters mean:

Parameter

Description

-Dimporttsv.separator

Here, our separator is a comma ( , ).

The default value is tab ( \t ).

-Dimporttsv.columns=HBASE_ROW_

KEY,f:max,f:min \

This is where we map our input files into

HBase tables.

The first field, sensor_id , is our key, and

we use HBASE_ROW_KEY to denote that the

rest we are inserting into column family f .

The second field, max temp , maps to

f:max .

The last field, min temp , maps to f:min .

sensors

This is the table name.

hbase-import

This is the HDFS directory where the data

files are located.

When we run this command, we will see that a MapReduce job is being kicked off.

This is how an import is parallelized.

Also, from the console output, we can see that MapReduce is importing two files

as follows:

[main] mapreduce.JobSubmitter: number of splits:2

While the job is running, we can inspect the progress from YARN (or the JobTracker

UI).

One thing that we can note is that the MapReduce job only consists of mappers. This

is because we are reading a bunch of files and inserting them into HBase directly.

There is nothing to aggregate. So, there is no need for reducers.

After the job is done, inspect the counters and we can see this:

Map-Reduce Framework

Map input records=7

Map output records=7

This tells us that mappers read seven records from the files and inserted seven

records into HBase.

Next Page

HBase Design Patterns

Search WWH ::

Custom Search

Home