Information Technology Reference
In-Depth Information
Case Study: Twitter's Early Database Architecture
WhenTwitterwasverynew,thehistoryofallTweetsfitonasingledatabaseserver
running MySQL. When that server filled up, Twitter started a new database server
and modified its software to handle the fact that its data was now segmented by
date.
As Twitter became more popular, the amount of time between a new segment
being started and that new database filling up decreased rapidly. It became a race
fortheoperationsteamtokeepupwithdemand.Thissolutionwasnotsustainable.
Loadwasunbalanced,asoldermachinesdidn'tgetmuchtraffic.Thissolutionwas
also expensive, as each machine required many replicas. It was logistically com-
plex as well.
Eventually, Twitter moved to a home-grown database system called T-bird,
based on Gizzard, which smoothly scales automatically.
Another way to segment data is by geography. In a global service it is common practice
to set up many individual data stores around the world. Each user's data is kept on the
nearest store. This approach also gives users faster access to their data because it is stored
closer to them.
Going from an unsegmented database to a segmented one may require considerable re-
factoring of application code. Thus scaling on the z -axis is often undertaken only when
scaling using the x - and y -axes is exhausted.
Additional ways to segment data include the following:
By Hash Prefix: This is known as sharding and is discussed later.
By Customer Functionality: For example, eBay segments by product—cars, elec-
tronics, and so on.
By Utilization: Putting high-use users in dedicated segments.
By Organizational Division: For example, sales, engineering, business develop-
ment.
Hierarchically: The segments are kept in a hierarchy. DNS uses this pattern, look-
ing up an address like www.everythingsysadmin.com first in the root servers, then
in the servers for com, and finally in the servers for the domain everythingsysad-
min.
Search WWH ::




Custom Search