Java Reference
In-Depth Information
The implements clause is for an interface that just defines the input string; it was used in a
demonstration to compare the regular expression mode with the use of a StringTokenizer .
The source for both versions is in the online source for this chapter. Running the program
against the sample input from Example 4-8 gives this output:
Using regex Pattern:
\^([\d.]+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(.+?)" (\d{3}) (\d+) "([\^"]+)"
"([\^"]+)"
Input line is:
123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] "GET /java/javaResources.html
HTTP/1.0" 200 10450 "-" "Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)"
IP Address: 123.45.67.89
Date&Time: 27/Oct/2000:09:27:09 -0400
Request: GET /java/javaResources.html HTTP/1.0
Response: 200
Bytes Sent: 10450
Browser: Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)
The program successfully parsed the entire logfile format with one call to match-
er.matches() .
Program: Data Mining
Suppose that I, as a published author, want to track how my book is selling in comparison to
others. I can obtain this information for free just by clicking the page for my book on any of
the major bookseller sites, reading the sales rank number off the screen, and typing the num-
ber into a file—but that's too tedious. As I wrote in the topic that this example looks for,
“computers get paid to extract relevant information from files; people should not have to do
such mundane tasks.” This program uses the Regular Expressions API and, in particular,
newline matching to extract a value from an HTML page on the hypothetical Quick-
BookShops.web website. It also reads from a URL object (see REST Web Service Client ).
The pattern to look for is something like this (bear in mind that the HTML may change at
any time, so I want to keep the pattern fairly general):
<b>QuickBookShop.web Sales Rank: </b>
26,252
</font><br>
Search WWH ::




Custom Search