Information Technology Reference
In-Depth Information
Terms to Know
Server: Software that provides a function or application program interface (API).
(Not a piece of hardware.)
Service: A user-visible system or product composed of many servers.
Machine: A virtual or physical machine.
QPS: Queries per second. Usually how many web hits or API calls received per
second.
Traffic: A generic term for queries, API calls, or other requests sent to a server.
Performant: A system whose performance conforms to (meets or exceeds) the
design requirements. A neologism from merging “performance” and “conform-
ant.”
Application Programming Interface (API): A protocol that governs how one
server talks to another.
Speed is important. It is a competitive advantage for a service to be fast and responsive.
Users consider a web site sluggish if replies do not come back in 200 ms or less. Network
latency eats up most of that time, leaving little time for the service to compose the page
itself.
In distributed systems, failure is normal. Hardware failures that are rare, when multi-
plied by thousands of machines, become common. Therefore failures are assumed, designs
work around them, and software anticipates them. Failure is an expected part of the land-
scape.
Due to the sheer size of distributed systems, operations must be automated. It is incon-
ceivabletomanuallydotasksthatinvolvehundredsorthousandsofmachines.Automation
becomescritical forpreparation anddeploymentofsoftware,regularoperations,andhand-
ling failures.
1.1 Visibility at Scale
To manage a large distributed system, one must have visibility into the system. The ability
to examine internal state—called introspection —is required to operate, debug, tune, and
repair large systems.
Inatraditionalsystem,onecouldimagineanengineerwhoknowsenoughaboutthesys-
tem to keep an eye on all the critical components or “just knows” what is wrong based on
experience. In a large system, that level of visibility must be actively created by designing
Search WWH ::




Custom Search