Database Reference
In-Depth Information
NOTE
If you are using HDInsight with ASV, you will not have as much need to
move data between clusters. That is because containers in Azure
Storage can be shared between clusters; there's no need to copy it.
However, you may still need to copy data from one container to
another. You can do this from the Azure Storage Explorer
( http://azurestorageexplorer.codeplex.com ) if you would
like a graphical user interface (GUI). You can also use the same HDFS
commands (including distcp ) to work with ASV; just use the
appropriate qualifier and reference to the container (for example,
asv:///MyAsvContainer/MyData/Test.txt ).
Implementing Data Structures for Easier Management
HDFS, being a file system, is organized into directories. Many commands
work with directories as well as with files, and a number of them also
support the -R parameter for applying the command recursively across all
child directories. Security can also be managed more easily for folders than
for individual files.
Given this, it is very effective to map your data files into a folder structure
that reflects the use and segmentation of the data. Using a hierarchical
folder structure that reflects the source, usage, and application for the data
supports this.
Consider, for example, a company that manages several websites. The
website traffic logs are being captured into HDFS, along with user activity
logs. Each activity log has its own distinct format. When creating a folder
structure for storing this information, you would consider whether it is
more important to segment the data by the site that originated it or by
the type of data. Which aspect is more important likely depends on the
business needs. For this example, suppose that the originating site is the
mostcriticalelement,becausethiscompanykeepstheirwebsiteinformation
heavily separated and secured. You might use a folder structure like this
one:
Search WWH ::




Custom Search