Database Reference
In-Depth Information
real hadoop jobs have recurring task startup delays, so the actual number of machines required to exceed the
limit is generally higher than calculated.
Note
Some initial indications that your job is being throttled by Windows Azure Storage may include the following:
Longer-than-expected job completion times
A high number of task failures
Job failure
Although these are indications that your cluster is being throttled, the best way to understand if your workload
is being throttled is by inspecting responses returned by Windows Azure Storage. Responses with an http status code
of 500 or 503 indicate that a request has been throttled. One way to collect Windows Azure Storage responses is to
turn on storage logging as described in http://www.windowsazure.com/en-us/manage/services/storage/how-to-
monitor-a-storage-account/#configurelogging . This is also discussed earlier in this topic in Chapter 11.
To avoid throttling, you can adjust parameters in the WASB driver self-throttling mechanism. The WASB driver is
the HDInsight component that reads data from and writes data to WASB. The driver has a self-throttling mechanism
that can slow individual virtual machine (VM) transfer rates between a cluster and WASB. This effectively slows the
overall transfer rate between a cluster and WASB. The rate at which the self-throttling mechanism slows the transfer
rate can be adjusted to keep transfer rates below throttling thresholds.
By default, the self-throttling mechanism is exercised for clusters with n (number of nodes) >= 7 , and it
increasingly slows transfer rates as n increases. The default rate at which self-throttling is imposed is set at cluster
creation time (based on the cluster size), but it is configurable after cluster creation.
The self-throttling algorithm works by delaying a request to WASB in proportion to the end-to-end latency of the
previous request. The exact proportion is determined by the following parameters (configurable in core-site.xml or
at job submission time):
fs.azure.selfthrottling.read.factor (used when reading data from WASB)
fs.azure.selfthrottling.write.factor (used when writing data to WASB)
Note
Valid values for these settings are in the following range: (0, 1).
Example 1: If your cluster has n=20 nodes and is primarily doing heavy write operations, you can calculate the
appropriate fs.azure.selftthrottling.write.factor value (for a storage account with geo-replication on):
fs.azure.selfthrottling.write.factor = 5Gbps/(800Mbps * 20) = 0.32
Example 2: If your cluster has n=20 nodes and is doing heavy read operations, you can calculate the appropriate
fs.azure.selfthrottling.read.factor value (for a storage account with geo-replication off ):
fs.azure.selfthrottling.read.factor = 15Gbps/(1600Mbps * 20) = 0.48
If you still find that throttling continues after adjusting the parameter values just shown, further analysis and
adjustment may be necessary.
 
 
Search WWH ::




Custom Search