Information Technology Reference
In-Depth Information
using heat that would destroy most objects, but the process instead makes steel stronger.
The mythical Hydra is antifragile because it grows two heads for each one it loses. The
way we learn is antifragile: studying for an exam involves working our brain hard, which
strengthens it.
Fragile systems break when the unexpected happens. Therefore, if we want our distrib-
uted computing systems to be antifragile, we should introduce randomness, frequently and
actively triggering the resiliency features. Every time a malfunction causes a team to fail-
over a database to a replica, the team gets more skilled at executing this procedure. They
mayalsogetideasforimprovements.Betterexecutionandproceduralimprovementsmake
the system stronger.
To accelerate this kind of improvement, we introduce malfunctions artificially. Rather
than trying to avoid malfunctions, we instigate them. If a failover process is broken, we
want to learn this fact in a controlled way, preferably during normal business hours when
the largest number of people are awake and available to respond. If we do not trigger fail-
ures in a controlled way, we will learn about such problems only during a real emergency:
usually after hours, usually when everyone is asleep.
15.1.2 Reducing Risk
Practicing risky procedures in a controlled fashion enables us to reduce risk. Do not con-
fuse risky behavior with risky procedures. Risky behavior is by definition risky and should
beavoided.Proceduresthatareriskycanbeimproved.Forexample,beingcarelessisrisky
behavior that should be avoided. There's no way to make carelessness less risky other than
tostopbeingcareless.Theriskisbuiltin.However,aproceduresuchasawebservicefail-
overmayormaynotberisky.Ifitisariskyprocedure,itcanbeimprovedandreengineered
to make it less risky.
When we confuse risky behavior with risky procedures, we end up applying a good
practice to the wrong thing. We make the mistake of avoiding procedures that should,
instead, be repeated often until the risk is removed. The technical term for improving
something through repetition is called “practice.” Practice makes perfect.
The traditional way of thinking is that computer systems are fragile and must be pro-
tected. As a consequence, they were surrounded by protection systems, whether they be
management policies or technical features. Instead of focusing on building a moat around
our operations, we should commit to antifragile practices to improve the strength and con-
fidence of our systems and organizations.
This new attitude toward failure aligns management practices with the reality of com-
plex systems:
• New software releases always have new unexpected bugs.
Search WWH ::




Custom Search