Databases Reference
In-Depth Information
Shards don't necessarily help performance; in certain cases, a shard hurts both performance and scalability. The
reason is that a shard imposes an overhead that wouldn't otherwise exist. Figure 9-9 shows the difference between
a standard ADO.NET call selecting records and best-case and worst-case scenarios when fetching the same records
from a shard. In the best case, all records are assumed to be split in three distinct databases; the shard is able to
concurrently access all three databases, aggregate the three resultsets, and filter and/or sort the data. The shard must
then manage all of the following, which consumes processing time:
Loops for connecting to the underlying databases
Loops for fetching the data
Data aggregation, sorting and filtering
In the worst case, all these operations can't be executed in parallel and require serial execution. This may be the
case if the TPL detects that only a single processor is available. Finally, you may end up in a situation that mixes worst-
and best-case scenarios, where some of the calls can be made in parallel, but not all.
Figure 9-9. Data access overhead comparison
Now that all the warnings are laid out, let's look at a scenario for which a shard makes sense and probably
improves both performance and scalability. Imagine a DOC table that contains only two records. The table contains
a few fields that represent document metadata, such as Title and Author ID. However, this table also contains a
large field: a varbinary column called Document that holds a PDF file. Each PDF file is a few megabytes in size.
Figure 9-10 shows the output of the table. Because this database is loaded in SQL Database, the SELECT * FROM DOCS
statement returns a few megabytes of data on an SSL encrypted link. The execution of this statement takes about 2.5
seconds on average, or roughly 1.25 seconds per record.
 
Search WWH ::




Custom Search