Database Reference
In-Depth Information
considered a small error (less than 1 second), the latter might be a bigger problem. In one case that I witnessed, a
customer's data was always generated using local standard time (without daylight saving time adjustments) and
stored directly in the database. The problems arose when, for reporting purposes, a correction for daylight saving time
had to be applied. A function designed to make a conversion between two time zones was implemented. Its signature
was the following:
new_time_dst(in_date DATE, tz1 VARCHAR2, tz2 VARCHAR2) RETURN DATE
Calling such a function once was very fast. The problem was calling it thousands of times for each report. The
response time increased by a factor of 25 as a result. Clearly, with the correct datatype, everything would be not only
faster, but also easier (the conversion would be performed automatically).
The third problem caused by wrong datatype selection is that things don't work as expected. Let's say you
have to range-partition a table, based on a DATE or TIMESTAMP column storing date and time information. This is
usually no big deal. The problem arises if the column used as the partition key contains the numeric representation
of the datetime value based on some format mask, or an equivalent string representation, instead of plain DATE or
TIMESTAMP values. If the conversion from the datetime values to the numeric values is performed with a format mask
like YYYYMMDDHH24MISS , the definition of the range partitions is still possible. However, if the conversion is based on
a format mask like DDMMYYYYHH24MISS , you have no chance of solving the problem without changing the datatype
or format of the column since the numeric (or string) order doesn't preserve the natural datetime value order (as
of version 11.1, in some cases it's possible to work around the problem by implementing virtual column based
partitioning).
The fourth problem caused by wrong datatype selection is related to the query optimizer. This is probably the
least obvious of this short list and also the one leading to the subtlest problems. The reason for this is that with the
wrong datatypes, the query optimizer will perform wrong estimates and, consequently, will choose suboptimal access
paths. Frequently, when something like that happens, most people blame the query optimizer that “once again” isn't
doing its job correctly. In reality, the problem is that information is hidden from it, so the query optimizer can't do its
job correctly. To better understand this problem, take a look at the following example, which is based on the
wrong_datatype.sql script. Here, you're checking the estimated cardinality of similar restrictions based on three
columns that store the same set of data (the date of each day in 2014) but that are based on different datatypes. As you
can see, the query optimizer is able to make a sensible estimation (the correct cardinality is 28) only for the column
that's correctly defined:
SQL> CREATE TABLE t (d DATE, n NUMBER(8), c VARCHAR2(8));
SQL> INSERT INTO t (d)
2 SELECT to_date('20140101','YYYYMMDD')+level-1
3 FROM dual
4 CONNECT BY level <= 365;
SQL> UPDATE t SET n = to_number(to_char(d,'YYYYMMDD')), c = to_char(d,'YYYYMMDD');
SQL> execute dbms_stats.gather_table_stats(ownname=>user, tabname=>'t')
SQL> SELECT * FROM t ORDER BY d;
D N C
--------- ---------- --------
01-JAN-14 20140101 20140101
02-JAN-14 20140102 20140102
...
 
Search WWH ::




Custom Search