Java Reference
In-Depth Information
The argument to the
StreamTokenizer
is the original standard input stream,
System.in
, inside an
InputStreamReader
object that converts the bytes to Unicode, inside a
BufferedReader
that
supplies the stream of Unicode characters via a buffer in memory.
Before we can make use of our
StreamTokenizer
object for keyboard input, we need to understand a
bit more about how it works.
Tokenizing a Stream
The
StreamTokenizer
class defines objects that can read an input stream and parse it into tokens.
The input stream is read and treated as a series of separate bytes, and each byte is regarded as a
character in the range
'u\0000'
to
'u\00FF'
. A
StreamTokenizer
object in its default state can
recognize the following kinds of tokens:
Token
Description
Numbers
A sequence consisting of the digits 0 to 9, plus possibly a decimal point,
and a + or - sign.
Strings
Any sequence of characters between a pair of single quotes or a pair of
double quotes.
Words
Any sequence of letters or digits 0 to 9 beginning with a letter. A letter is
defined as any of A to Z and a to z or \u00A0 to \u00FF. A word follows
a whitespace character and is terminated by another whitespace
character, or any character other than a letter or a digit.
Comments
Any sequence of characters beginning with a forward slash, /, and
ending with the end-of-line character. Comments are ignored and not
returned by the tokenizer.
Whitespace
All byte values from \u0000 to \u0020, which includes space, backspace,
horizontal tab, vertical tab, line feed, form feed, and carriage return.
Whitespace acts as a delimiter between tokens and is ignored (except
within a quoted string).
To retrieve a token from the stream, you call the
nextToken()
method for the
StreamTokenizer
object:
int tokenType = 0;
try {
while(tokenType = tokenizer.nextToken() != tokenizer.TT
_
EOF) {
// Do something with the token...
}
} catch (IOException e) {
e.printStackTrace(System.err);
System.exit(1);
}