Databases Reference
In-Depth Information
MySQL, you might want “mysql” to be a stopword, because it's too common to be
helpful.
You can often improve performance by skipping short words. The length is configu-
rable with the ft_min_word_len parameter. Increasing the default value will skip more
words, making your index smaller and faster, but less accurate. Also bear in mind that
for special purposes, you might need very short words. For example, a full-text search
of consumer electronics products for the query “cd player” is likely to produce lots of
irrelevant results unless short words are allowed in the index. A user searching for “cd
player” won't want to see MP3 and DVD players in the results, but if the minimum
word length is the default four characters, the search will actually be for just “player,”
so all types of players will be returned.
The stopword list and the minimum word length can improve search speeds by keeping
some words out of the index, but the search quality can suffer as a result. The right
balance is application-dependent. If you need good performance and good-quality
results, you'll have to customize both parameters for your application. It's a good idea
to build in some logging and then investigate common searches, uncommon searches,
searches that don't return results, and searches that return a lot of results. You can gain
insight about your users and your searchable content this way, and then use that insight
to improve performance and the quality of your search results.
Be aware that if you change the minimum word length, you'll have to
rebuild the index with OPTIMIZE TABLE for the change to take effect. A
related parameter is ft_max_word_len , which is mainly a safeguard to
avoid indexing very long keywords.
If you're importing a lot of data into a server and you want full-text indexing on some
columns, disable the full-text indexes before the import with DISABLE KEYS and enable
them afterward with ENABLE KEYS . This is usually much faster because of the high cost
of updating the index for each row inserted, and you'll get a defragmented index as a
bonus.
For large datasets, you might need to manually partition the data across many nodes
and search them in parallel. This is a difficult task, and you might be better off using
an external full-text search engine, such as Lucene or Sphinx. Our experience shows
they can have orders of magnitude better performance.
Distributed (XA) Transactions
Whereas storage engine (see “Transactions” on page 6 ) transactions give ACID prop-
erties inside the storage engine, a distributed (XA) transaction is a higher-level trans-
action that can extend some ACID properties outside the storage engine—and even
 
Search WWH ::




Custom Search