Databases Reference
In-Depth Information
All that remains is to verify that the trigger maintains the hash:
mysql> INSERT INTO pseudohash (url) VALUES ('http://www.mysql.com');
mysql> SELECT * FROM pseudohash;
+----+----------------------+------------+
| id | url | url_crc |
+----+----------------------+------------+
| 1 | http://www.mysql.com | 1560514994 |
+----+----------------------+------------+
mysql> UPDATE pseudohash SET url='http://www.mysql.com/' WHERE id=1;
mysql> SELECT * FROM pseudohash;
+----+---------------------- +------------+
| id | url | url_crc |
+----+---------------------- +------------+
| 1 | http://www.mysql.com/ | 1558250469 |
+----+---------------------- +------------+
If you use this approach, you should not use SHA1() or MD5() hash functions. These
return very long strings, which waste a lot of space and result in slower comparisons.
They are cryptographically strong functions designed to virtually eliminate collisions,
which is not your goal here. Simple hash functions can offer acceptable collision rates
with better performance.
If your table has many rows and CRC32() gives too many collisions, implement your
own 64-bit hash function. Make sure you use a function that returns an integer, not a
string. One way to implement a 64-bit hash function is to use just part of the value
returned by MD5() . This is probably less efficient than writing your own routine as a
user-defined function (see Chapter 7 ), but it'll do in a pinch:
mysql> SELECT CONV(RIGHT(MD5('http://www.mysql.com/'), 16), 16, 10) AS HASH64;
+---------------------+
| HASH64 |
+---------------------+
| 9761173720318281581 |
+---------------------+
When you search for a value by its hash, you must also include
the literal value in your WHERE clause:
mysql> SELECT id FROM url WHERE url_crc=CRC32("http://www.mysql.com")
-> AND url="http://www.mysql.com";
Handling hash collisions.
The following query will not work correctly, because if another URL has the CRC32()
value 1560514994, the query will return both rows:
mysql> SELECT id FROM url WHERE url_crc=CRC32("http://www.mysql.com");
The probability of a hash collision grows much faster than you might think, due to the
so-called Birthday Paradox. CRC32() returns a 32-bit integer value, so the probability of
a collision reaches 1% with as few as 93,000 values. To illustrate this, we loaded all the
words in /usr/share/dict/words into a table along with their CRC32() values, resulting in
98,569 rows. There is already one collision in this set of data! The collision makes the
following query return more than one row:
 
Search WWH ::




Custom Search