Database Reference
In-Depth Information
Figure 2-2. HDInsight with Azure blob storage
As illustrated in Figure 2-2 , the master node as well as the worker nodes in an HDInsight cluster default to WASB
storage, but they also have the option to fall back to traditional DFS. In the case of default WASB, the nodes, in turn,
use the underlying containers in the Windows Azure blob storage.
Uploading Data to Windows Azure Storage Blob
Windows Azure HDInsight clusters are typically deployed to execute MapReduce jobs, and are dropped once these jobs
have completed. Retaining large volumes data in HDFS after computations are done is not at all cost effective. Windows
Azure Blob Storage is a highly available, scalable, high capacity, low cost, and shareable storage option for data that
is to be processed using HDInsight. Storing data in WASB enables your HDInsight clusters to be independent of the
underlying storage used for computation, and you can safely release those clusters without losing data.
The first step toward deploying an HDInsight solution on Azure is to decide on a way to upload data to WASB
efficiently. We are talking BigData here. Typically, the data that needs to be uploaded for processing will be in the
terabytes and petabytes. This section highlights some off-the-shelf tools from third-parties that can help in uploading
such large volumes to WASB storage. Some of the tools are free, and some you need to purchase.
Azure Storage Explorer: A free tool that is available from codeplex.com. It provides a nice
Graphical User Interface from which to manage your Azure Blob containers. It supports all
three types of Azure storage: blobs, tables, and queues. This tool can be downloaded from:
http://azurestorageexplorer.codeplex.com/
 
Search WWH ::




Custom Search