Database Reference
In-Depth Information
+------+----------------+------------+----------------+------------+
| 1787 | Delaware | 1787-12-07 | New Jersey | 1787-12-18 |
| 1787 | Delaware | 1787-12-07 | Pennsylvania | 1787-12-12 |
| 1787 | New Jersey | 1787-12-18 | Delaware | 1787-12-07 |
| 1787 | New Jersey | 1787-12-18 | Pennsylvania | 1787-12-12 |
| 1787 | Pennsylvania | 1787-12-12 | Delaware | 1787-12-07 |
| 1787 | Pennsylvania | 1787-12-12 | New Jersey | 1787-12-18 |
| 1912 | Arizona | 1912-02-14 | New Mexico | 1912-01-06 |
| 1912 | New Mexico | 1912-01-06 | Arizona | 1912-02-14 |
| 1959 | Alaska | 1959-01-03 | Hawaii | 1959-08-21 |
| 1959 | Hawaii | 1959-08-21 | Alaska | 1959-01-03 |
+------+----------------+------------+----------------+------------+
The condition in the ON clause that requires state pair names not to be identical elimi‐
nates the trivially duplicate rows showing that each state joined the Union in the same
year as itself. But you'll notice that each remaining pair of states still appears twice. For
example, there is one row that lists Delaware and New Jersey, and another that lists New
Jersey and Delaware. This is often the case with self-joins: they produce pairs of rows
that contain the same values, but for which the values are not in the same order.
Because the values are not listed in the same order within the rows, they are not identical
and you can't get rid of these “near duplicates” by adding DISTINCT to the statement. To
solve this problem, select rows in such a way that only one row from each pair ever
appears in the query result. Slightly modify the ON clause, from:
ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name <> s2.name
to:
ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name < s2.name
Using < rather than <> selects only those rows in which the first state name is lexically
less than the second, and eliminates rows in which the names appear in opposite order
(as well as rows in which the state names are identical). The resulting query produces
the desired output without duplicates:
mysql> SELECT YEAR(s1.statehood) AS year,
-> s1.name AS name1, s1.statehood AS statehood1,
-> s2.name AS name2, s2.statehood AS statehood2
-> FROM states AS s1 INNER JOIN states AS s2
-> ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name < s2.name
-> ORDER BY year, name1, name2;
+------+----------------+------------+----------------+------------+
| year | name1 | statehood1 | name2 | statehood2 |
+------+----------------+------------+----------------+------------+
| 1787 | Delaware | 1787-12-07 | New Jersey | 1787-12-18 |
| 1787 | Delaware | 1787-12-07 | Pennsylvania | 1787-12-12 |
| 1787 | New Jersey | 1787-12-18 | Pennsylvania | 1787-12-12 |
| 1912 | Arizona | 1912-02-14 | New Mexico | 1912-01-06 |
Search WWH ::




Custom Search