Biomedical Engineering Reference
In-Depth Information
act as a true system administrator and setup a system from scratch.
Following installation by our supplier, all day-to-day maintenance could
be handled by a nominated member of the computational team who had
access to an external consultant if required. The disadvantage of buying
a pre-confi gured system became apparent when there was a problem
after the warranty had expired. The master server developed a
problem in passing jobs to the compute nodes. The supplier would no
longer support the system and our Linux consultants could not resolve
the issue because they did not understand how the cluster was meant
to be working.
In the end, the situation was resolved by the purchase of a new server.
With this server, and all subsequent servers, we have taken the approach
of buying them without any operating system (OS) installed. We are
therefore able to load the OS of our choice and confi gure the server to
our own individual requirements. This can delay the deployment but it
does mean when there are problems, and there have been some, we are
much better able to resolve the issue.
Lesson: Understand your own hardware or know somebody who does.
This highlighted two other important points. First, the design of our
system as it was with a single server meant there was no built-in
redundancy. When the server went down, the disk arrays went down.
Consequently, none of the Windows folders mapped to the disk array
were available; this had an impact across the whole of the company.
Second, it showed that IT hardware needs replacing in a timely manner
even if it is working without fault. This can be a diffi cult sell to
management as IT hardware is often viewed with the mantra 'if it ain't
broke don't fi x it'; however, at OGT we take the approach that three
years is the normal IT lifespan and, although some hardware is not
replaced at this milestone, we ensure that all the hardware for our
business critical systems is. This coupled with a warranty that provides
next business day support, means that OGT has a suitable level of
resource available in the event of there being any IT hardware issues.
Lesson: Take a modular approach and split functions between
systems.
￿ ￿ ￿ ￿ ￿
Subsequently, we have developed a more component-based approach
of dividing operations between different servers so that even when one
server goes down unexpectedly, other services are not affected. For
example, there is a single server providing fi le access, another for running
a BLAT server, another running our MySQL database, etc. This is
illustrated in Figure 10.2.
 
Search WWH ::




Custom Search