Java Reference
In-Depth Information
Scanning Input with Grammatical Structure
Problem
You need to parse a file whose structure can be described as “grammatical” (in the sense of
computer languages, not natural languages).
Solution
Use one of many parser generators.
Discussion
ner
(see
Scanning Input with the Scanner Class
)
are useful, they know only a limited number
of tokens and have no way of specifying that the tokens must appear in a particular order. To
do more advanced scanning, you need some special-purpose scanning tools. “Parser generat-
ors” have a long history in computer science. The best-known examples are the C-language
yacc
(Yet Another Compiler Compiler) and
lex
, released with Seventh Edition Unix in the
1970s and discussed in
lex & yacc
(O'Reilly), and their open source clones
bison
and
flex
.
These tools let you specify the lexical structure of your input using some pattern language
such as regular expressions (see
Chapter 4
). For example, you might say that an email ad-
dress consists of a series of alphanumerics, followed by an at sign (@), followed by a series
of alphanumerics with periods embedded, as:
name: [A-Za-z0-9]+@[A-Za-z0-0.]
The tool then writes code that recognizes the characters you have described. These tools also
have a grammatical specification, which says, for example, that the keyword
ADDRESS
must
appear, followed by a colon, followed by a “name” token, as previously defined.
There are several good third-party parser generator tools for Java. They vary widely based on
complexity, power, ease of use, and so on:
▪ JavaCC is an open source project on
java.net
.