Database Reference
In-Depth Information
value=data1
000700 column=cf1:cq2, timestamp=1393866122073,
value=data2
000700 column=cf2:cq3, timestamp=1393866431669,
value=data4
000700 column=cf2:cq3, timestamp=1393866138714,
type=DeleteColumn
000700 column=cf2:cq3, timestamp=1393866138714,
value=data3
1 row(s) in 0.0370 seconds
When will the deleted entries be permanently removed? To understand this
process, it is necessary to understand how HBase processes operations and
achieves the real-time read and write access. As mentioned earlier, an HBase table
is split into regions based on the row. Each region is maintained by a worker node.
During a put or delete operation against a particular region, the worker node
first writes the command to a Write Ahead Log (WAL) file for the region. The
WAL ensures that the operations are not lost if a system fails. Next, the results
of the operation are stored within the worker node's RAM in a repository called
MemStore [31].
Writing the entry to the MemStore provides the real-time access required. Any
client can access the entries in the MemStore as soon as they are written. As
the MemStore increases in size or at predetermined time intervals, the sorted
MemStore is then written (flushed) to a file, known as an HFile, in HDFS on the
same worker node. A typical HBase implementation flushes the MemStore when
its contents are slightly less than the HDFS block size. Over time, these flushed files
accumulate, and the worker node performs a minor compaction that performs
a sorted merge of the various flushed files.
Meanwhile, any get or scan requests that the worker node receives examine these
possible storage locations:
• MemStore
• HFiles resulting from MemStore flushes
• HFiles from minor compactions
Thus, in the case of a delete operation followed relatively quickly by a get
operation on the same row, the tombstone marker is found in the MemStore and
the corresponding previous versions in the smaller HFiles or previously merged
Search WWH ::




Custom Search