Scripting Languages and Data Formats - Game Programming Algorithms and Techniques

Game Development Reference

In-Depth Information

Table 11.2 Simple C File Tokenized

Although it certainly is possible to write a tokenizer (also known as a scanner

or lexer) by hand, it definitely is not recommended. Manually writing such code

is extremely error prone simply because there are so many cases that must be

handled. For example, a tokenizer for C++ must be able to recognize that only one

of these is the actual new keyword:

newnew

_new

new_new

_new_

new

Instead of manually writing a tokenizer, it is preferable to use a tool such a flex,

which generates a tokenizer for you. The way flex works is you give it a series

of matching rules, known as regular expressions , and state which regular expres-

sions match which tokens. It will then automatically generate code (in the case of

flex, in C) for a tokenizer that emits the correct tokens based on the given rules.

Regular Expressions

Regular expressions (also known as regex) have many uses beyond tokenization.

For example, most IDEs can perform a search across code files using regular ex-

pressions, which can be a very handy way to find specific types of sequences. Al-

though general regular expressions can become rather complex, matching patterns

for a scripting language only require using a very small subset of regular expres-

sions.

Search WWH ::

Custom Search

Home