Databases Reference
In-Depth Information
Then, there are, of course, doc , length , and cs members that give us the text to
parse, its length in bytes, and its character set. The length is essential, because the text
is not necessarily zero terminated. The character set is important, but unfortunately
not all MySQL functions that work with it are part of the Plugin API. By using them
we would inevitably break the versioning protection and risk a crash any time
MySQL developers make a change in the character set code. A complete solution to
this problem will come after MySQL 5.1, in the framework of Server Services, which
are described in the Appendix. Meanwhile, we will ignore the character set in our
plugins below.
Although the flags element is there to set flags, currently there is only one flag
available called MYSQL_FTFLAGS_NEED_COPY . To understand it, we will need an
example. Let's say we have an "extractor" plugin that allows us to index files having
only their names in the database. That is, we will treat doc as a filename, open the file
and use mysql_parse() to parse it. Such a plugin can be implemented like this:
char buf[1024];
FILE *f = fopen(param->doc, "r");
while (fgets(buf, sizeof(buf), f))
param->mysql_parse(param, buf, strlen(buf));
fclose(f);
This is, of course, a very simplified example. It has no error checking, and
param->doc may be not zero terminated, but it shows the problem. The parsing
function mysql_parse() will find words in our buf and will call mysql_add_word()
with pointers to words—pointers into our buf . However, when we read the next
line from the file, buf will be overwritten with new content and old words, defined
as pointers into it, will change to garbage. Furthermore, when we return from our
parse() function, buf will cease to exist, because it is declared as a local variable
on the stack. For our example to work we want mysql_add_word() to make a
copy of every word that it needs, or to use and discard words right away and not
expect them to persist. And we should tell mysql_add_word() that it needs to copy
words by setting this flag MYSQL_FTFLAGS_NEED_COPY . Sometimes this flag may
be set by MySQL too, before invoking our parse() function, if MySQL needs
mysql_add_word() to copy words even if the plugin does not require it. Note that
copying all words (and allocating memory for them) adds significant overhead to
the full-text processing, so try to use this flag sparingly. For example, MySQL's
built-in mysql_parse() function does not need to set it; it defines words as pointers
into doc text, and all words will stay valid as long as the doc text itself will, which
is long enough for MySQL and does not require copying of words.
 
Search WWH ::




Custom Search