Database Reference
In-Depth Information
Other questions should also be discussed from day 1. Why so early? If your
organization is spinning up its first big data/Hadoop team for development
and there is no one in operations who has any experience in either, it will
take them many months to come up to speed on big data and Hadoop (so
that they can effectively support it). Why months? Because most likely they
have current duties that they have to be responsible for, and other than a
few weeks of specialized training, everything else they pick up will be from
working part time with the project team. It is vital to the success of the
project, though, that you have a skilled and confident operations team ready
to support the solution when it is ready to be deployed.
Before deployment of the solution, the development team should work with
the operations team to develop a run book that documents the architecture
of the solution and provides operations with the responses to specific and
expected failures. Typical plans should account for the following:
• If a particular job fails, how to respond. What should one look for to see
the state of the job?
• If a node fails, how to respond.
• If connectivity to the source system is down, whom to notify.
• What are the common error messages that your system will surface?
What are the steps to resolve these error messages?
• How should the team proactively monitor the environment? For
example, performance and space usage trending?
Now that you have developed the run book, you can plan for the handoff to
operations and place the solution in production.
After Deployment
During the first week after handing off to operations, the development team
and operations need to work closely together to ensure proper knowledge
transfer of the solution. Much of this should have been completed through
the documentation process of the solution and creation of the run book, but
we all have experience where documentation doesn't get read.
Many tasks need to be done to keep the Hadoop cluster healthy and ready
to continue to accept more data and to process that data in an acceptable
timeframe as defined by your service level agreements (SLAs). These tasks
include the previously mentioned job monitoring, managing the various
Search WWH ::




Custom Search