Database Reference
In-Depth Information
We would then need to use our judgment to decide whether one or both of those columns are
unique by chance, or as part of the design of the database that created them.
It's possible that there is no single column that uniquely identifies the rows. Multiple column
keys are fairly common. If none of the columns were unique, then we would start to look for
unique keys that are combinations of the most unique columns. The following query shows
a frequency distribution for the table: a value occurs twice in one case, and another value
occurs only once.
postgres=# SELECT count as num_of_values, count(*)
FROM (SELECT customerid, count(*)
FROM ord
GROUP BY customerid) s
GROUP BY count
ORDER BY count(*);
num_of_values | count
---------------+-------
2 | 1
1 | 1
(2 rows)
and we can change the query to include multiple columns, like the following:
SELECT count as num_of_values, count(*)
FROM (SELECT column1, column2.... columnN
,count(*)
FROM ord
GROUP BY column1, column2.... columnN
) s
GROUP BY count
ORDER BY count(*);
This query will result in just one row, once we find a set of columns that is unique. As we
get closer to finding the key, we will see that the distribution gets tighter and tighter.
How it works...
Finding a unique key is possible for a program, though in most cases, a human can do this
much faster by looking at things like column names, foreign keys, or business understanding
to reduce the number of searches required by the brute-force approach.
ANALYZE works by taking a sample of the table data, and then performing a statistical
analysis of the results. The n_distinct value is the estimate of the "number of distinct
values" for the column.
 
Search WWH ::




Custom Search