Database Reference
In-Depth Information
4
Designing and Creating Tables
Redundant Data
No matter how fast the database server is, it can only work well if it is processing efficiently
organized data. Much of the speed and efficiency of modern database engines relies on the
use of well-designed tables that are quick to process. One of the ways in which modern
DBMS have improved the performance of databases is by eliminating redundant data.
Redundant data is data that is repeated unnecessarily within a table. Figure 4.1 shows a
table that stores a record of accesses to a website. The client host details have been changed
for anonymity.
Look at each column in Figure 4.1. Every column has data in it that is repeated. For
instance, the webpage column has only three unique pages listed in it, and the index.php
page is repeated in this column four times. Now look at the browser column. Not only is the
Mozilla entry repeated three times it is also storing a lot of text for each row. Although lots
of data is repeated, it is not necessarily all redundant.
Data is classed as redundant if it can be removed from a table without loss of informa-
tion. For instance, look at the extract from this table shown in Figure 4.2.
When viewed in isolation from the main table shown in Figure 4.1, the data shown in
Figure 4.2 has more obvious redundancies. You will notice that in the six rows shown, only
three of these rows are unique. When a remote machine accesses a webserver, it does this
via its web browser software, using its IP number which is matched (usually) to its domain
name. While that browser is looking at pages on your site, normally these three pieces of
information remain the same for each page request. Only the page that they are looking at,
the referring page and the time that they made the request would change.
Figure 4.1
Example website log.
33
Search WWH ::




Custom Search