Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

Unpredictable queries also constitute a large part of the computation issues.

Unpredictable queries (as used in exploratory analysis) are, by definition, those where

you do not know in advance what the user may want to find out. These impose a number

of problems, including:

•

You do not know whether the answer to a query will require

aggregated data or access to transaction-level data. If the answer

can be satisfied by aggregated data then multi-dimensional cubes,

materialized views, and pre-defined time-series type of data

preparation may be appropriate.

•

Even where a query may be satisfied through use of pre-

aggregated data, the nature of unpredictable queries is such that

you cannot always guarantee that the correct aggregations are

available. If they are not then the materialized views and cubes

will need to be regenerated or in many cases if the queries are

looking for attributes that are not in the materialized views and

cubes, then you will have to revert to creation of new

pre-aggregation mechanisms.

•

The other problem with unpredictable queries is that appropriate

indexes may not be defined. While the database optimizer can

re-write badly constructed SQL, and determine the most efficient

joins and optimize the query path in general, it cannot do

anything for lack of indexes. In practice, if a column is not indexed

at all, then this will usually mean that the query has to perform

a full table scan, and if this is a large table then there will be a

substantial performance hit as a result.

■ One possible option is to build indexes on every conceivable column. Unfortunately

this is not usually practical. Every index you build will help to improve the performance

of queries that use that index, but at the same time every index you add to the database

increases the size of the database.

Note

Certain types of queries require that the whole of a table must be scanned. Some

of these arise when there are no available indexes, or from the sorts of complex queries

described above. However, very much simpler queries can also give rise to full table

scans. For example:

• “List the full name and email address for customers born in July:”

Given that one in 12 customers are born in July, a typical database

optimizer will not consider it worthwhile to use an index, and it

will conduct a full table scan. If you have 10 million customers for

each of whom you store 3,200 bytes, for instance, then this will

mean reading a total of 32,000,000,000 bytes.

Big Data Imperatives

Search WWH ::

Custom Search

Home