Databases Reference
In-Depth Information
Improving Mean Time Between Failures
You can avoid a lot of downtime with a little due diligence. When we categorized
downtime incidents and attributed them to root causes, we also identified ways they
could have been prevented. We found that most downtime incidents can be averted
through an overall common-sense approach to managing systems. The following sug-
gestions are selected from the guidelines in the white paper we wrote detailing the
results of our analysis:
• Test your recovery tools and procedures, including restores from backups.
• Follow the principle of least privilege.
• Keep your systems clean and neat.
• Use good naming and organization conventions to avoid confusion, such as
whether servers are for development or production use.
• Upgrade your database server on a prudent schedule to keep it current.
• Test carefully with a tool such as pt-upgrade from Percona Toolkit before
upgrading.
• Use InnoDB, configure it properly, and ensure that it is set as the default storage
engine and the server cannot start if it is disabled.
• Make sure the basic server settings are configured properly.
• Disable DNS with skip_name_resolve .
• Disable the query cache unless it has proven beneficial.
• Avoid complexity, such as replication filters and triggers, unless absolutely needed.
• Monitor important components and functions, especially critical items such as
disk space and RAID volume status, but avoid false positives by alerting only on
conditions that reliably indicate problems.
• Record as many historical metrics as possible about server status and performance,
and keep them forever if you can.
• Test replication integrity on a regular basis.
• Make replicas read-only, and don't let replication start automatically.
• Perform regular query reviews.
• Archive and purge unneeded data.
• Reserve some space in filesystems. In GNU/Linux, you can use the -m option to
reserve space in the filesystem itself. You can also leave space free in your LVM
volume group. Or, perhaps simplest of all, just create a large dummy file that you
can delete if the filesystem becomes completely full. 2
2. It's 100% cross-platform-compatible!
 
Search WWH ::




Custom Search