Information Technology Reference
In-Depth Information
partitioning of data by row in relational databases is not new and is referred
to as horizontal partitioning in parallel database technology. The distinc-
tion between sharding and horizontal partitioning is that horizontal par-
titioning is done transparently to the application by the database, whereas
sharding is explicit partitioning done by the application. However, the two
techniques have started converging, since traditional database vendors
have started offering support for more sophisticated partitioning strategies.
Since sharding is similar to horizontal partitioning, we first discuss differ-
ent horizontal partitioning techniques. It can be seen that a good sharding
technique depends upon both the organization of the data and the type of
queries expected.
The different techniques of sharding are as follows:
1. Round-robin partitioning : The round-robin method distributes the
rows in a round-robin fashion over different databases. In the exam-
ple, we could partition the transaction table into multiple databases
so that the first transaction is stored in the first database, the second
in the second database, and so on. The advantage of round-robin
partitioning is its simplicity. However, it also suffers from the disad-
vantage of losing associations (say) during a query, unless all data-
bases are queried. Hash partitioning and range partitioning do not
suffer from the disadvantage of losing record associations.
2. Hash partitioning method : In this method, the value of a selected
attribute is hashed to find the database into which the tuple should
be stored. If queries are frequently made on an attribute (say
Customer_Id), then associations can be preserved by using this attri-
bute as the attribute that is hashed, so that records with the same
value of this attribute can be found in the same database.
3. Range partitioning : The range partitioning technique stores records
with similar attributes in the same database. For example, the range
of Customer_Id could be partitioned between different databases.
Again, if the attributes chosen for grouping are those on which que-
ries are frequently made, record association is preserved and it is not
necessary to merge results from different databases. Range partition-
ing can be susceptible to load imbalance, unless the partitioning is
chosen carefully. It is possible to choose the partitions so that there
is an imbalance in the amount of data stored in the partitions (data
skew) or in the execution of queries across partitions (execution skew).
These problems are less likely in round-robin and hash partitioning,
since they tend to uniformly distribute the data over the partitions.
Thus, hash partitioning is particularly well suited to large-scale systems.
Round-robin simplifies a uniform distribution of records but does not facili-
tate the restriction of operations to single partitions. While range partitioning
Search WWH ::




Custom Search