Databases Reference
In-Depth Information
(a) CUSTOMER table
Customer
Customer
Number
Name
Street
City
State
Country
1
02847
Mervis
123 Oak St.
TN
USA
2
03185
Gomez
345 Main Ave.
Columbus
OH
USA
3
03480
Taylor
50 Elm Rd.
San Diego
CA
USA
4
06837
Stevens
876 Leslie Ln.
Raleigh
NC
USA
5
08362
Adams
1200 Wallaby St.
Brisbane
Australia
6
12739
Gomez
345 Main Ave.
Columbus
GA
USA
7
13848
Lucas
742 Ave. Louise
Brussels
Belgium
8
15367
Tailor
50 Elm Rd.
San Diego
CA
USA
9
15933
Chang
48 Maple Ave.
Toronto
ON
Canada
10
18575
Smith
390 Martin Dr.
Columbus
RP
USA
11
21359
Sanchez
666 Ave. Bolivar
Santiago
Chile
(b) SALE table
Book
Customer
Numbe r
Numbe r
Date
Price
Quantity
1
426478
03480
May 19, 2003
32.99
1
2
077656
18575
May 19, 2003
19.95
21
3
365905
06837
May 19, 2003
24.99
3
4
645688
21359
May 20, 2003
49.50
1
5
474640
15367
May 34, 2003
3200.99
1
6
426478
08362
June 03, 2003
32.99
2
7
276432
03480
June 04, 2003
30.00
1
8
365905
12738
June 04, 2003
24.99
1
9
276432
06837
June 05, 2003
30.00
5
10
327467
18575
June 12, 2003
-32.99
2
11
426478
06837
June 15, 2003
32.99
1
F IGURE 13.12
Good Reading Bookstores sample data prior to data cleaning
Possible Misspelling: Rows 3 and 8 have different customer numbers but are
otherwise identical except for a one-letter difference in the customer name,
''Taylor'' vs. ''Tailor.'' Do both rows refer to the same person? For the sake of
argument, say that an online white pages is not available but a real estate listing
indicating which addresses are single-family houses and which are apartment
buildings is. A program could be designed to assume that if the address is a
single-family house, there is a misspelling and the two records refer to the same
person. On the other hand, if the address is an apartment building, they may,
indeed, be two different people.
Impossible Data: Row 10 has a state value of ''RP.'' There is no such state
abbreviation in the U.S. This must be flagged and corrected either automatically
or manually.
 
Search WWH ::




Custom Search