Internet Addresses - Java Network Programming

Java Reference

In-Depth Information

The name of the file to be processed is passed to Weblog as the first argument on the

command line. A FileInputStream fin is opened from this file and an InputStream

Reader is chained to fin . This InputStreamReader is buffered by chaining it to an

instance of the BufferedReader class. The file is processed line by line in a for loop.

Each pass through the loop places one line in the String variable entry . entry is then

split into two substrings: ip , which contains everything before the first space, and

theRest , which is everything from the first space to the end of the string. The position

of the first space is determined by entry.indexOf(" ") . The substring ip is converted

to an InetAddress object using getByName() . getHostName() then looks up the host‐

name. Finally, the hostname and everything else on the line ( theRest ) are printed on

System.out . Output can be sent to a new file through the standard means for redirecting

output.

Weblog is more efficient than you might expect. Most web browsers generate multiple

logfile entries per page served, because there's an entry in the log not just for the page

itself but for each graphic on the page. And many visitors request multiple pages while

visiting a site. DNS lookups are expensive and it simply doesn't make sense to look up

each site every time it appears in the logfile. The InetAddress class caches requested

addresses. If the same address is requested again, it can be retrieved from the cache

much more quickly than from DNS.

Nonetheless, this program could certainly be faster. In my initial tests, it took more than

a second per log entry. (Exact numbers depend on the speed of your network connection,

the speed of the local and remote DNS servers, and network congestion when the pro‐

gram is run.) The program spends a huge amount of time sitting and waiting for DNS

requests to return. Of course, this is exactly the problem multithreading is designed to

solve. One main thread can read the logfile and pass off individual entries to other

threads for processing.

A thread pool is absolutely necessary here. Over the space of a few days, even low-

volume web servers can generate a logfile with hundreds of thousands of lines. Trying

to process such a logfile by spawning a new thread for each entry would rapidly bring

even the strongest virtual machine to its knees, especially because the main thread can

read logfile entries much faster than individual threads can resolve domain names and

die. Consequently, reusing threads is essential. The number of threads is stored in a

tunable parameter, numberOfThreads , so that it can be adjusted to fit the VM and net‐

work stack. (Launching too many simultaneous DNS requests can also cause problems.)

This program is now divided into two classes. The first class, LookupTask , shown in

Example 4-11 , is a Callable that parses a logfile entry, looks up a single address, and

replaces that address with the corresponding hostname. This doesn't seem like a lot of

work and CPU-wise, it isn't. However, because it involves a network connection, and

Search WWH ::

Custom Search

Home