Database Reference
In-Depth Information
Using the distributed copy (DistCp)
Distributed copy (DistCp) is a Hadoop utility used to copy data in parallel within and
between clusters. It uses Hadoop's MapReduce to perform the copy operation. DistCp is the
most widely used data transfer tool in Hadoop clusters. For example:
$ hadoop distcp hdfs://namenode1/src hdfs://namenode2/dest
The preceding command would copy the src folder and all its contents from the cluster
managed by namenode1 to the cluster managed by namenode2 as the dest folder.
DistCp, by default, does not overwrite the files at the target location and skips copying
them if the files already exists. However, files can be forced to be overwritten using the
overwrite flag.
There are several options that can be used along with the Hadoop distcp command and
the details of these options can be found at http://www.cloudera.com/content/cloudera-con-
tent/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cd-
h5ig_distcp_data_cluster_migrate.html .
Search WWH ::




Custom Search