Information Technology Reference
In-Depth Information
The primary strategy for dealing with this problem in user-facing services is graceful
degradation. This topic was covered in Section 2.1.10 .
Dynamic Resource Allocation
Another strategy is to add capacity dynamically. With this approach, a system would
detect that a service is becoming overloaded and allocate an unused machine from a pool
of idle machines that are running but otherwise unconfigured. An automated system would
configure the machine and use it to add capacity to the overloaded service, thereby resolv-
ing the issue.
It can be costly to have idle capacity but this cost can be mitigated by using a shared
pool . That is, one pool of idle machines serves a group of services. The first service to be-
come overloaded allocates themachines. Ifthepoolislargeenough,morethanoneservice
canbecome overloaded atthesame time. There shouldalsobeamechanism forservices to
give back machines when the need disappears.
Additionalcapacitycanbefoundatotherserviceprovidersaswell.Apubliccloudcom-
putingprovidercanbeusedasthesharedpool.Usuallyyouwillnothavetopayforunused
capacity.
Shared resource pools are not just appropriate for machines, but may also be used for
storage and other resources.
Load Shedding
Another strategy is load shedding . With this strategy the service turns away some users so
that other users can have a good experience.
Tomake ananalogy,anoverloaded phonesystem doesn'tsuddenlydisconnect all exist-
ing calls. Instead, it responds to any new attempts to make a call with a “fast busy” tone so
that the person will try to make the call later. An overloaded web site should likewise give
some users an immediate response, such as a simple “come back later” web page, rather
than requiring them to time out after minutes of waiting.
A variation of load shedding is stopping certain tasks that can be put off until later. For
example, low-priority database updates could be queued up for processing later; a social
network that stores reputation points for users might store the fact that points have been
awardedratherthanprocessingthem;nightlybulkfiletransfersmightbedelayedifthenet-
work is overloaded.
Thatsaid,tasksthatcanbeputoffforacoupleofhoursmightcauseproblemsiftheyare
put off forever. There is, after all, a reason they exist. For any activity that is delayed due
to load shedding, there must be a plan on how such a delay is handled. Establish a service
level agreement (SLA) to determine how long something can be delayed and to identify a
Search WWH ::




Custom Search