Database Reference
In-Depth Information
Windows Azure Storage Blob
The underlying storage infrastructure for Azure is known as Windows Azure Blob Storage (WABS). Microsoft has
implemented a thin wrapper that exposes this blob storage as the HDFS file system for HDInsight. This is referred to as
Windows Azure Storage Blob (WASB) and is a notable change in Microsoft's Hadoop implementation on Windows Azure.
As you saw throughout the topic, Windows Azure Storage Blob (WASB) replaces HDFS and is the storage for
your HDInsight clusters, by default. It is important to understand the WASB issues you may encounter during your
job submissions because all your input files are in WASB, and all the output files written by Hadoop are also in your
cluster's dedicated WASB container.
WASB Authentication
One of the most common errors encountered during cluster operations is the following:
org.apache.hadoop.fs.azure.AzureException:
Unable to access container <container> in account <storage_account>
using anonymous credentials, and no credentials found for them in the configuration.
This message essentially means that the WASB code couldn't find the key for the storage account in the
configuration.
Typically, the problem is one of two things:
core-site.xml . Or it is there, but not in the correct format. This is
usually easy to check (assuming you can use Remote Desktop to connect to your cluster). Take
a look in the cluster (in C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\conf\core-site.xml ) for
the configuration name-value pair with the name being fs.azure.account.key.<account> .
The key is not present in
core-site.xml , but the process running into this exception is not reading
core-site.xml . Most Hadoop components (MapReduce, Hive, and so on) read core-site.xml
from that location for their configuration, but some don't. For example, Oozie has its own
copy of core-site.xml that it uses. This is harder to chase, but if you're using a non-standard
Hadoop component, this might be the culprit.
The key is there in
You should confirm your storage account key from your Azure Management portal and make sure that you have
the correct entry in the core-site.xml file.
Azure Throttling
Windows Azure Blob Storage limits the bandwidth per storage account to maintain high storage availability for all
customers. Limiting bandwidth is done by rejecting requests to storage (HTTP response 500 or 503) in proportion
to recent requests that are above the allocated bandwidth. To learn about such storage account limits, refer to the
following page:
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-
storage-and-2012-scalability-targets.aspx .
Your cluster will be throttled if or when your cluster is writing data to or reading data from WASB at rates greater
than those stated earlier. You can determine if you might hit those limits based on the size of your cluster and your
workload type.
 
Search WWH ::




Custom Search