Databases Reference
In-Depth Information
Table 3.2 Stock Prices for AllElectronics and HighTech
Timepoint AllElectronics HighTech
t1
6
20
t2
5
10
t3
4
14
t4
3
5
t5
2
5
(e.g., the data follow multivariate normal distributions) does a covariance of 0 imply
independence.
Example 3.2 Covariance analysis of numeric attributes. Consider Table 3.2, which presents a sim-
plified example of stock prices observed at five time points for AllElectronics and
HighTech , a high-tech company. If the stocks are affected by the same industry trends,
will their prices rise or fall together?
6C5C4C3C2
5
20
5 D $ 4
E
.
AllElectronics
/D
D
and
20C10C14C5C5
5
54
5 D $ 10.80.
E
.
HighTech
/D
D
Thus, using Eq. (3.4), we compute
620C510C414C35C25
5
Cov
.
AllElectroncis , HighTech
/D
410.80
D 50.243.2 D 7.
Therefore, given the positive covariance we can say that stock prices for both companies
rise together.
Variance is a special case of covariance, where the two attributes are identical (i.e., the
covariance of an attribute with itself). Variance was discussed in Chapter 2.
3.3.3 Tuple Duplication
In addition to detecting redundancies between attributes, duplication should also be
detected at the tuple level (e.g., where there are two or more identical tuples for a given
unique data entry case). The use of denormalized tables (often done to improve per-
formance by avoiding join s) is another source of data redundancy. Inconsistencies often
arise between various duplicates, due to inaccurate data entry or updating some but not
all data occurrences. For example, if a purchase order database contains attributes for
 
Search WWH ::




Custom Search