Database Reference
In-Depth Information
less common than node failures, so replicating across fewer racks doesn't
have an appreciable impact on availability.
NOTE
The replica placement approach is subject to change, as the HDFS
developers consider it a work in progress. As they learn more about
usage patterns, they plan to update the policies to deliver the optimal
balance of performance and availability.
HDFS monitors the replication levels of files to ensure the replication factor
is being met. If a computer hosting a DataNode were to crash, or a network
rack were taken offline, the NameNode would flag the absence of heartbeat
messages. If the nodes are offline for too long, the NameNode stores
forwarding requests to them, and it also checks the replication factors of any
data blocks associated with those nodes. In the event that the replication
factor has fallen below the threshold set when the file was created, the
NameNode begins replication of those blocks again.
Using Common Commands to Interact with HDFS
This section discusses interacting with HDFS. Even though HDFS is a
distributed file system, you can interact with it in a similar way as you
do with a traditional file system. However, this section covers some key
differences. The command examples in the following sections work with
the Hortonworks Data Platform environment setup in Chapter 3, “Installing
HDInsight.”
Interfaces for Working with HDFS
Bydefault,HDFSincludestwomechanismsforworkingwithit.Theprimary
way to interact with it is by the use of a command-line interface. For status
checks, reporting, and browsing the file system, there is also a web-based
interface.
Hadoop is a Java script that can run several modules of the Hadoop system.
The two modules that are used for HDFS are dfs (also known as FsShell)
and dfsadmin . The dfs module is used for most common file operations,
Search WWH ::




Custom Search