Database Reference
In-Depth Information
In practice, the overhead related to the position of the columns is (more) noticeable in one of the
following situations:
When tables have many columns, and SQL statements frequently reference very few of the
ones located near the end of the row.
When many rows are read from a single block, such as during a full table scan. This is
because more often than not, when accessing few rows per block, the overhead for locating
and accessing a block is by far more significant than the one for locating and accessing the
columns if few rows are read. For example, if I run the column_order.sql script by setting
PCTFREE to 90 (hence I decrease the number of rows per block), the query referencing the first
column performs less than two times faster than the query referencing the 250th column
(as shown in Figure 16-2 , it's about five times faster with PCTFREE set to 10).
Since trailing NULL values aren't stored, it makes sense to place columns expected to contain NULL values at the
end of the table. In this way, the number of physically stored columns and consequently the average size of the rows
might decrease.
Optimal Datatype
As briefly described in Chapter 1, specifically in the “Designing for Performance” section, in recent years I have
witnessed a worrying trend in physical design that I call wrong datatype selection . At first glance, choosing the
datatype for a column seems like a very straightforward decision to make. Nevertheless, in a world where software
architects spend a lot of time discussing high-level things such as agile software development, SOA, or persistence
frameworks, most people seem to forget about low-level ones. I'm convinced it's essential to get back to the basics and
discuss why datatype selection is important.
Pitfalls in Datatype Selection
To illustrate wrong datatype selection, I present five examples of typical problems that I have encountered over and
over again.
The first problem caused by wrong datatype selection is wrong or lacking validation of data when it's inserted or
modified in the database. For example, if a column is supposed to store numeric values, choosing a character string
datatype for it calls for an external validation. In other words, the database engine isn't able to validate the data. It
leaves it to the application to do. Even if such a validation is easy to implement, bear in mind that every time the same
piece of code is spread to several locations, instead of being centralized in the database, sooner or later there will be
a mismatch in functionality (typically, in some locations the validation may be forgotten, or maybe the validation
changes later and its implementation is updated only in some locations). The example I'm presenting is related to
the nls_numeric_characters initialization parameter. Remember that this initialization parameter specifies the
characters used as decimals and group separators. For example, in Switzerland it's usually set to “.,”, and therefore the
value pi is formatted as follows: 3.14159. Instead, in Germany it's commonly set to “,.”, and therefore the same value
is formatted as follows: 3,14159. Sooner or later, running an application with different client-side settings of this
initialization parameter will cause an ORA-01722: invalid number error if conversions from VARCHAR2 to NUMBER
take place because of using a wrong datatype in the database. And by the time you notice this, your database will be
filled with VARCHAR2 columns containing both formats, and therefore a painful data reconciliation will have to
be performed.
The second problem caused by wrong datatype selection is loss of information. In other words, during the
conversion of the original (correct) datatype to the database datatype, information gets lost. For example, imagine
what happens when the date and time of an event is stored with a DATE datatype instead of a TIMESTAMP WITH TIME
ZONE datatype. Fractional seconds and time zone information get lost. Although the former leads to what could be
 
Search WWH ::




Custom Search