Designing for High Performance - Pro SQL Database for Windows Azure

Databases Reference

In-Depth Information

Shards don't necessarily help performance; in certain cases, a shard hurts both performance and scalability. The

reason is that a shard imposes an overhead that wouldn't otherwise exist. Figure 9-9 shows the difference between

a standard ADO.NET call selecting records and best-case and worst-case scenarios when fetching the same records

from a shard. In the best case, all records are assumed to be split in three distinct databases; the shard is able to

concurrently access all three databases, aggregate the three resultsets, and filter and/or sort the data. The shard must

then manage all of the following, which consumes processing time:

•

Loops for connecting to the underlying databases

•

Loops for fetching the data

•

Data aggregation, sorting and filtering

In the worst case, all these operations can't be executed in parallel and require serial execution. This may be the

case if the TPL detects that only a single processor is available. Finally, you may end up in a situation that mixes worst-

and best-case scenarios, where some of the calls can be made in parallel, but not all.

Figure 9-9. Data access overhead comparison

Now that all the warnings are laid out, let's look at a scenario for which a shard makes sense and probably

improves both performance and scalability. Imagine a DOC table that contains only two records. The table contains

a few fields that represent document metadata, such as Title and Author ID. However, this table also contains a

large field: a varbinary column called Document that holds a PDF file. Each PDF file is a few megabytes in size.

Figure 9-10 shows the output of the table. Because this database is loaded in SQL Database, the SELECT * FROM DOCS

statement returns a few megabytes of data on an SSL encrypted link. The execution of this statement takes about 2.5

seconds on average, or roughly 1.25 seconds per record.

Search WWH ::

Custom Search

Home