Databases Reference
In-Depth Information
Continuing on understanding how workloads can impact the performance, for a transactional
query the workload is streamlined and can be compared to automobiles on expressways. There is
a smooth flow of traffic and all drivers follow lane management principles. The flow is interrupted
when a breakdown, accident, or large volume of big rigs merge in the traffic. On a similar note, you
will see a slowdown or failure of query processing if there is an infrastructure breakdown or a large
and complex query that might have been generated by an ad-hoc user, which will resemble traffic
jams on any large-city expressway when an accident occurs.
The hidden complexities of data warehouse queries are not discernable by visual inspection. The
reason for this is the design and architecture of the semantic layers built on top of the data warehouse,
which allows the user to navigate the relationships in the underlying data and ask questions that can
drive workload and optimization demands in multiple layers, and sometimes creates conflicts between
the optimization and its workload model in the database.
For example, to determine the sales dollars per month and year for each customer in a household,
the query on a data warehouse will look as follows:
Select cust.customer_name, cust.cust_id, household.household_id, household.
state, sum(sales.net_sales_amt) net_sales, sales.month_num, sales.year_num from
cust,household,sales where cust.household_id = household.household_id and cust.cust_
id = sales.cust_id group by cust.customer_name, cust.cust_id, household.household_id,
sales.month_num, sales.year_num
Let us assume the tables sizes are:
Customer: 300,000 rows
Household: 190,000 rows
Sales: 75,000,000 rows
To execute this query, the data warehouse will perform several cycles of computational processing
across layers of infrastructure, as shown in Figure 6.3 :
There are large data sets that will be read from the disk and moved to the memory and back to the
disk. This will lead to excessive data swapping between different layers on a continued basis.
Additionally, the same data may be needed multiple times and depending on how stale or fresh
the data is, the query might have to repeat the processing steps since there is not enough space to
persist the result sets.
Depending on the configuration setting of the database and the server, the memory and CPU are
divided between the multiple threads that are generated by the query for processing. For example,
a dual-core CPU can execute four threads—two per CPU. In the case of an average server the
CPU can execute 32 threads for eight CPUs. If there are 32 threads and eight users running, the
CPU cycles will be divided for the eight users.
If there are multiple queries executing operations in parallel, there will be contention for
resources, as everybody will be on the same shared-everything architecture. Multiple threads of
execution can decrease throughput and increase swapping of data and slowdown of execution.
If the data warehouse load operations execute in parallel with the query processing, they can
evolve into sync issues especially in circular references.
A mix of query and load operations can cause the database to underperform in the shared
architecture as it creates a mixed workload.
Adding ad-hoc users and data mining type of operations can cause further underperformance.
Search WWH ::




Custom Search