Databases Reference
In-Depth Information
Type
Description
FT_TOKEN_LEFT_PAREN
Start of a subexpression. In the default MySQL settings this
corresponds to ( (left parenthesis) full-text search operators.
This token type is also used to mark the beginning of a
phrase (because a phrase is also a subexpression, in a sense).
MySQL distinguishes between these two cases by looking at
the quot member.
FT_TOKEN_RIGHT_PAREN
Right parenthesis, end of a subexpression or a phrase search.
For the last two tokens mysql_add_word() ignores the word argument.
So, in a simple case we can use MYSQL_FTPARSER_BOOLEAN_INFO as follows:
MYSQL_FTPARSER_BOOLEAN_INFO boolean_info =
{ FT_TOKEN_WORD, 0, 0, 0, 0, 0, 0 };
param->mysql_add_word(param, word, len, &boolean_info);
The official documentation for all of these structures, constants, and functions can
be found in the plugin header file plugin.h . It is well worth looking through when
writing plugins of this type.
A PHP full-text parser
To show the layout of a full-text parser plugin we will create a simple parser to parse
PHP scripts. PHP syntax has a few peculiarities that are not taken into account by the
MySQL built-in full-text parser. In particular, all variable names in PHP start with
a dollar sign, which is, in fact, a part of the name; a variable $while is not the same
as a loop statement while . But a dollar sign is not just another character that can be
used in variable names—the string "$foo$bar" contains two PHP variables, not one.
Also, variables can have different scopes; a variable foo::$bar is not the same as a
variable $bar . Let's try to solve this in our full-text parser plugin. According to the
above, it will be a "tokenizer" plugin—a plugin that splits the text into words.
As usual, we start by including the required header files:
#include <mysql/plugin.h>
#include <stdio.h>
#include <ctype.h>
 
Search WWH ::




Custom Search