Database Reference
In-Depth Information
While inserting data from the HBase shell, the flush command can be
used to write the in-memory (memstore) data to the store iles.
If there is a server failure, the WAL can effectively retrieve the log to get everything
up to where the server was prior to the crash failure. Hence, the WAL guarantees
that the data is never lost. Also, as another level of assurance, the actual write-ahead
log resides on the HDFS, which is a replicated ilesystem. Any other server having a
replicated copy can open the log.
The HLog class represents the WAL. When an HRegion object is instantiated, the
single HLog instance is passed on as a parameter to the constructor of HRegion . In
the case of an update operation, it saves the data directly to the shared WAL and
also keeps track of the changes by incrementing the sequence numbers for each edit.
WAL uses a Hadoop SequenceFile, which stores records as sets of key-value pairs.
Here, the HLogKey instance represents the key, and the key-value represents the
rowkey, column family, column qualiier, timestamp, type, and value along with the
region and table name where data needs to be stored. Also, the structure starts with
two ixed-length numbers that indicate the size and value of the key. The following
diagram shows the structure of a key-value pair:
Row KEY
Column
Family
Length
Key
Length
Value
Length
Row
Length
Column
Family
Column
Qualifire
Time
Stamp
Key
Type
Row
Value
The WALEdit class instance takes care of atomicity at the log level by wrapping each
update. For example, in the case of a multicolumn update for a row, each column is
represented as a separate KeyValue instance. If the server fails after updating a few
columns to the WAL, it ends up with only a half-persisted row and the remaining
updates are not persisted. Atomicity is guaranteed by wrapping all updates that
comprise multiple columns into a single WALEdit instance and writing it in a
single operation.
 
Search WWH ::




Custom Search