Database Reference
In-Depth Information
The following table explains what the parameters mean:
Parameter
Description
-Dimporttsv.separator
Here, our separator is a comma ( , ).
The default value is tab ( \t ).
-Dimporttsv.columns=HBASE_ROW_
KEY,f:max,f:min \
This is where we map our input files into
HBase tables.
The first field, sensor_id , is our key, and
we use HBASE_ROW_KEY to denote that the
rest we are inserting into column family f .
The second field, max temp , maps to
f:max .
The last field, min temp , maps to f:min .
sensors
This is the table name.
hbase-import
This is the HDFS directory where the data
files are located.
When we run this command, we will see that a MapReduce job is being kicked off.
This is how an import is parallelized.
Also, from the console output, we can see that MapReduce is importing two files
as follows:
[main] mapreduce.JobSubmitter: number of splits:2
While the job is running, we can inspect the progress from YARN (or the JobTracker
UI).
One thing that we can note is that the MapReduce job only consists of mappers. This
is because we are reading a bunch of files and inserting them into HBase directly.
There is nothing to aggregate. So, there is no need for reducers.
After the job is done, inspect the counters and we can see this:
Map-Reduce Framework
Map input records=7
Map output records=7
This tells us that mappers read seven records from the files and inserted seven
records into HBase.
Search WWH ::




Custom Search