Planning for Cloud Integration: Pitfalls and Advantages - Deploying and Managing a Cloud Infrastructure

Information Technology Reference

In-Depth Information

Preparing for Outages

Although it's more of a disaster planning and recovery concern, preparing for outages is

relevant and of utmost importance in the context of cost and capacity optimization. Public

cloud outages can cause huge losses in revenue for a company. They can even alienate an

organization's main customers. Companies should generally follow three key principles:

prepare for failure, design for failure, and have an alternative.

Engineers should have a good understanding of the weak points of the system. They

should also be prepared for real disaster, and one way to do that is to carry out service

outage drills. Cloud environments are composed of multiple machines, and machines fail.

The system must be designed to handle failure, and although it may be expensive, it might

pay off in the end.

Furthermore, a few of the mission-critical services can be deployed and served out

of the alternative data centers if necessary. This helps in minimizing the risk of “total

blackout” during an outage of a particular public cloud data center housing an orga-

nization's applications. One option could be to deploy applications in multiple regions

and in isolated locations within those regions, such as by using Amazon's Regions

and Availability Zones. The idea is to spread the chances of failure over a number

of locations, which automatically reduces the probability of total blackout. This is

because failures are usually isolated to specific locations and do not spread outward

to other regions. See the following location for information on Amazon's Regions and

Availability Zones:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-

availability-zones.html

Fine-Tuning Auto-Scaling Rules

Applications that are able to automatically scale the number of server instances offer flexibility

and great opportunity for optimization. For example, you could have an auto-scaling rule that

spawns a new instance once CPU utilization reaches 80 percent on all current instances and

another that spawns once average CPU utilization reaches 50 percent.

However, how do businesses know whether 80 percent and 50 percent are the right per-

centages? There are two methods to determine the right percentage. The first is based on the

trial-and-error method. An organization stress tests its application and determines loads under

which the response time of an application starts lagging behind the usual response time or

causes a noticeable delay. The second approach is to calculate the maximum number of tasks,

users, or processes an application can simultaneously handle and convert that to a percentage

in terms of compute capacity. Factors other than compute capacity can also be included, such

as memory footprint, network utilization, and disk utilization.

Nevertheless, you may still need to experiment with different combinations to get it

absolutely spot-on and then be able to perform considerable optimization.

Search WWH ::

Custom Search

Home