Java Reference
In-Depth Information
}
}
}
If you run this code, the first pattern (with the wildcard character .) always matches, whereas
the second pattern (with $ ) matches only when MATCH_MULTILINE is set:
> java regex.NLMatch
INPUT: I dream of engines
more engines, all day long
PATTERN engines
more engines
DEFAULT match true
MULTILINE match: true
PATTERN engines$
DEFAULT match false
MULTILINE match: true
Program: Apache Logfile Parsing
The Apache web server is the world's leading web server and has been for most of the Web's
history. It is one of the world's best-known open source projects, and the first of many
fostered by the Apache Foundation. But the name Apache is often claimed to be a pun on the
origins of the server; its developers began with the free NCSA server and kept hacking at it
or “patching” it until it did what they wanted. When it was sufficiently different from the ori-
ginal, a new name was needed. Because it was now “a patchy server,” the name Apache was
chosen. Officialdom denies the story, but it's cute anyway. One place actual patchiness does
show through is in the logfile format. Consider Example 4-8 .
Example 4-8. Apache log file excerpt
123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] "GET /java/javaResources.html
HTTP/1.0" 200 10450 "-" "Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)"
The file format was obviously designed for human inspection but not for easy parsing. The
problem is that different delimiters are used: square brackets for the date, quotes for the re-
quest line, and spaces sprinkled all through. Consider trying to use a StringTokenizer ; you
might be able to get it working, but you'd spend a lot of time fiddling with it. However, this
somewhat contorted regular expression [ 21 ] makes it easy to parse:
 
Search WWH ::




Custom Search