Game Development Reference
In-Depth Information
Table 11.2 Simple C File Tokenized
Although it certainly is possible to write a tokenizer (also known as a scanner
or lexer) by hand, it definitely is not recommended. Manually writing such code
is extremely error prone simply because there are so many cases that must be
handled. For example, a tokenizer for C++ must be able to recognize that only one
of these is the actual new keyword:
newnew
_new
new_new
_new_
new
Instead of manually writing a tokenizer, it is preferable to use a tool such a flex,
which generates a tokenizer for you. The way flex works is you give it a series
of matching rules, known as regular expressions , and state which regular expres-
sions match which tokens. It will then automatically generate code (in the case of
flex, in C) for a tokenizer that emits the correct tokens based on the given rules.
Regular Expressions
Regular expressions (also known as regex) have many uses beyond tokenization.
For example, most IDEs can perform a search across code files using regular ex-
pressions, which can be a very handy way to find specific types of sequences. Al-
though general regular expressions can become rather complex, matching patterns
for a scripting language only require using a very small subset of regular expres-
sions.
Search WWH ::




Custom Search