Java Reference
In-Depth Information
Scanning Input with Grammatical Structure
Problem
You need to parse a file whose structure can be described as “grammatical” (in the sense of
computer languages, not natural languages).
Solution
Use one of many parser generators.
Discussion
Although the StreamTokenizer class (see Scanning Input with StreamTokenizer ) and Scan-
ner (see Scanning Input with the Scanner Class ) are useful, they know only a limited number
of tokens and have no way of specifying that the tokens must appear in a particular order. To
do more advanced scanning, you need some special-purpose scanning tools. “Parser generat-
ors” have a long history in computer science. The best-known examples are the C-language
yacc (Yet Another Compiler Compiler) and lex , released with Seventh Edition Unix in the
1970s and discussed in lex & yacc (O'Reilly), and their open source clones bison and flex .
These tools let you specify the lexical structure of your input using some pattern language
such as regular expressions (see Chapter 4 ). For example, you might say that an email ad-
dress consists of a series of alphanumerics, followed by an at sign (@), followed by a series
of alphanumerics with periods embedded, as:
name: [A-Za-z0-9]+@[A-Za-z0-0.]
The tool then writes code that recognizes the characters you have described. These tools also
have a grammatical specification, which says, for example, that the keyword ADDRESS must
appear, followed by a colon, followed by a “name” token, as previously defined.
There are several good third-party parser generator tools for Java. They vary widely based on
complexity, power, ease of use, and so on:
▪ One of the best known and most elaborate is ANTLR .
▪ JavaCC is an open source project on java.net .
Search WWH ::




Custom Search