Pattern Matching with Regular Expressions - Java

Java Reference

In-Depth Information

The implements clause is for an interface that just defines the input string; it was used in a

demonstration to compare the regular expression mode with the use of a StringTokenizer .

The source for both versions is in the online source for this chapter. Running the program

against the sample input from Example 4-8 gives this output:

Using regex Pattern:

\^([\d.]+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(.+?)" (\d{3}) (\d+) "([\^"]+)"

"([\^"]+)"

Input line is:

123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] "GET /java/javaResources.html

HTTP/1.0" 200 10450 "-" "Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)"

IP Address: 123.45.67.89

Date&Time: 27/Oct/2000:09:27:09 -0400

Request: GET /java/javaResources.html HTTP/1.0

Response: 200

Bytes Sent: 10450

Browser: Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)

The program successfully parsed the entire logfile format with one call to match-

er.matches() .

Program: Data Mining

Suppose that I, as a published author, want to track how my book is selling in comparison to

others. I can obtain this information for free just by clicking the page for my book on any of

the major bookseller sites, reading the sales rank number off the screen, and typing the num-

ber into a file—but that's too tedious. As I wrote in the topic that this example looks for,

“computers get paid to extract relevant information from files; people should not have to do

such mundane tasks.” This program uses the Regular Expressions API and, in particular,

newline matching to extract a value from an HTML page on the hypothetical Quick-

BookShops.web website. It also reads from a URL object (see REST Web Service Client ).

The pattern to look for is something like this (bear in mind that the HTML may change at

any time, so I want to keep the pattern fairly general):

<b>QuickBookShop.web Sales Rank: </b>

26,252

</font><br>

Java

Search WWH ::

Custom Search

Home