Cloudware Application Development - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

of thousands of HDFS clients per cluster since each DataNode may

execute multiple application tasks simultaneously. The DataNodes

are responsible for managing read and write requests from the file

system's clients, block maintenance, and perform replication as

directed by the NameNode. The block management in HDFS is dif-

ferent from a normal file system. The size of the data file equals the

actual length of the block. This means if a block is half full, it needs

only half of the space of the full block on the local drive, thereby opti-

mizing storage space for compactness, and there is no extra space

consumed on the block unlike a regular file system.

3. Image : An image represents the metadata of the namespace (inodes

and lists of blocks). On startup, the NameNode pins the entire

namespace image in memory. The in-memory persistence enables

the NameNode to service multiple client requests concurrently.

4. Journal : The journal represents the modification log of the image in

the local host's native file system. During normal operations, each

client transaction is recorded in the journal, and the journal file is

flushed and synced before the acknowledgment is sent to the cli-

ent. The NameNode upon startup or from a recovery can replay this

journal.

5. Checkpoint : To enable recovery, the persistent record of the image is

also stored in the local host's native files system and is called a check-

point. Once the system starts up, the NameNode never modifies or

updates the checkpoint file. A new checkpoint file can be created

during the next startup, on a restart, or on demand when requested

by the administrator or by the CheckpointNode.

17. 3 . 2 H B a s e

HBase is an open-source, nonrelational, column-oriented, multidimen-

sional, distributed database developed on Google's BigTable architecture.

It is designed with high availability and high performance as drivers to

support storage and processing of large data sets on the Hadoop frame-

work. HBase is not a database in the purist definition of a database. It

provides unlimited scalability and performance and supports certain

features of an ACID-compliant database. HBase is classified as a NoSQL

database due to its architecture and design being closely aligned to Base

(Being Available and Same Everywhere). Why do we need HBase when

the data are stored in the HDFS file system, which is the core data stor-

age layer within Hadoop? HBase is very useful for operations other than

MapReduce execution and operations that are not easy to work with in

HDFS and when you need random access to data. First, it provides a

database-style interface to Hadoop, which enables developers to deploy

Search WWH ::

Custom Search

Home