Information Technology Reference
In-Depth Information
member starts by checking the bug tracking system to review the bugs assigned to him or
her,orpossiblytoreviewunassignedissuesofhigherprioritytheteammembermightneed
to take on.
Software development in operations tends to mirror the Agile methodology: rather than
making large, sudden changes, many small projects evolve the system over time. Chapter
12 will discuss automation and software engineering topics in more detail.
Projectsthatdonotinvolvesoftwaredevelopment mayinvolvetechnical work.Moving
a service to a new datacenter is highly technical work that cannot be automated because it
happens infrequently.
Operations staff tend not to physically touch hardware not just because of the heavy
useofvirtualmachines,butalsobecauseevenphysicalmachinesarelocatedindatacenters
that are located far away. Datacenter technicians act as remote hands , applying physical
changes when needed.
Oncall Days
Oncall days are spent working on projects until an alert is received, usually by SMS, text
message, or pager.
Onceanalertisreceived,theissueisworkeduntilitisresolved.Oftentherearemultiple
solutions to a problem, usually including one that will fix the problem quickly but tem-
porarily and others that are long-term fixes. Generally the quick fix is employed because
returning the service to normal operating parameters is paramount.
Once the alert is resolved, a number of other tasks should always be done. The alert
should be categorized and annotated in some form of electronic alert journal so that trends
may be discovered. If a quick fix was employed, a bug should be filed requesting a longer-
term fix. The oncall person may take some time to update the playbook entry for this alert,
thereby building organizational memory. If there was a user-visible outage or an SLA vi-
olation, a postmortem report should be written. An investigation should be conducted to
ascertain the root cause of the problem. Writing a postmortem report, filing bugs, and root
causes identification are all ways that we raise the visibility of issues so that they get atten-
tion. Otherwise, we will continually muddle through ad hoc workarounds and nothing will
ever get better. Postmortem reports (possibly redacted for technical content) can be shared
with the user community to build confidence in the service.
The benefit of having a specific person assigned to oncall duty at any given time is that
it enables the rest of the team to remain focused on project work. Studies have found that
the key to software developer productivity is to have long periods of uninterrupted time.
That said, if a major crisis appears, the oncall person will pull people away from their pro-
jects to assist.
Search WWH ::




Custom Search