Databases Reference
In-Depth Information
has become a highly automated process in both big data and fault-tolerant systems.
Let's look at how sharding works and explore its challenges.
Let's say you've created a website that allows users to log in and create their own
personal space to share with friends. They have profiles, text, and product informa-
tion on things they like (or don't like). You set up your website, store the information
in a MySQL database, and you run it on a single CPU . People love it, they log in, create
pages, invite their friends, and before you realize it your disk space is 95% full. What
do you do? If you're using a typical RDBMS system, the answer is buy a new system and
transfer half the users to the new system. Oh, and your old system might have to be
down for a while so you can rewrite your application to know which database to get
information from. Figure 2.9 shows a typical example of database sharding.
The process of moving from a single to multiple databases can be done in a num-
ber of ways; for example:
You can keep the users with account names that start with the letters A-N on the
first drive and put users from O - Z on the new system.
1
You can keep the people in the United States on the original system and put the
people who live in Europe on the new system.
2
You can randomly move half the users to the new system.
3
Each of these alternatives has pros and cons. For example, in option 1, if a user
changes their name, should they be automatically moved to the new drive? In
option 2, if a user moves to a new country, should all their data be moved? If people
tend to share links with people near them, would there be performance advantages to
keeping these users together? What if people in the United States tend to be active at
the same time in the evening? Would one database get overwhelmed and the other be
Before
shard
Warning—processor at 90% capacity!
Time to “shard”— copy half of the data
to a new processor.
After
shard
Each processor gets
half of the load.
Original processor
New processor
Figure 2.9 Sharding is performed when a single processor can't handle the
throughput requirements of a system. When this happens you'll want to
move the data onto two systems that each take half the work. Many NoSQL
systems have automatic sharding built in so that you only need to add a new
server to a pool of working nodes and the database management system
automatically moves data to the new node. Most RDBMSs don't support
automatic sharding.
Search WWH ::




Custom Search