Storing and Configuring Data with Hadoop, YARN, and ZooKeeper - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

A full explanation of these administration commands is beyond the scope of this chapter, but by using the

dfsadmin command you can manage quotas, control the upgrade, refresh the nodes, and enter safe mode. Check the

Hadoop site hadoop.apache.org for full information.

Summary

In this chapter you have been introduced to both Hadoop V1 and V2 in terms of their installation and use. It is hoped

you can see that, by using the CDH stack release, the installation process and use of Hadoop are much simplified.

In the course of this chapter you have installed Hadoop V1 manually via a download package from the Hadoop

site. You have then installed V2 and YARN via CDH packages and the yum command. Servers for HDFS and YARN are

started as Linux services in V2 rather than as scripts, as in V1. Also, in the CDH release logs, binaries and configuration

functions were separated into their own, specific directories.

You have been shown the same Map Reduce task as run on both versions of Hadoop. Task run times were

comparable between V1 and V2. However, V2 offers the ability to have a larger production cluster than does V1.

(In the following chapters you will look at Map Reduce programming in Java and Pig).

You have also configured Hadoop V2 across a mini cluster with name nodes and data nodes on different servers.

You have installed and used ZooKeeper, setting up a quorum and using the client. (In the next chapter, HBase—the

Hadoop database—will be discussed and that calls upon ZooKeeper).

Lastly, you have looked at the command set for file system and for user and administration commands. True, it

was only a brief look, but further information is available at the Hadoop website.

Search WWH ::

Custom Search

Home