Web Applications - Refactoring HTML: Improving the Design of Existing Web Applications

HTML and CSS Reference

In-Depth Information

GET /foo.html HTTP/1.1

If-None-Match: "6548d4-30a9e-c7f4e5c0"

Host: www.elharo.com

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;

en-US;rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

Accept: application/xhtml+xml,text/html,text/*;q=0.8,*/*;q=0.5

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Keep-Alive: 300

Connection: keep-alive

If the server recognizes that the ETag matches the current ETag of the requested resource, it responds with a

304 Not Modified response, like this:

HTTP/1.1 304 Not Modified

Mon, 09 Apr 2007 15:21:10 GMT

It really doesn't have to say anything else. In particular, the server does not send the body of the resource as it

would if there were no ETag. Instead, the client loads its old copy of the resource from its cache. It can do this

even if the expiration date of the resource in the cache has passed because it has checked with the server that

the old representation is still fresh. For large documents, this can save significant bandwidth.

Web servers today send ETags for static files, and you don't have to do anything extra for those. However,

dynamic pages generated by PHP and similar frameworks are trickier. Sometimes every request to the server

creates a different byte stream. If this is the case, don't bother sending an ETag. It will never help. However,

many scripts sit somewhere in the middle. For example, suppose a script responds to a request for

http://www.example.com/isbn/0691049548/ by making several SQL queries against a database and then

formatting the results as HTML. If nothing in the database has changed, there's no need for clients to keep

requesting that data. However, clients won't know that unless the server gives them an ETag.

ETags versus Caching

ETags have a complex but partially orthogonal relationship to caching. Caches and cache control headers

determine when a browser does or does not check back with a server before showing an old copy to a

client. ETag headers come into play after a browser has decided to check back with a server.

There are no special rules for how one constructs an ETag. Conceptually you can think of it like an MD5 or SHA-

1 hash code for a document. However, because ETags do not need to be secure, and because these algorithms

are computationally expensive, they are not the best choice for this purpose. Instead, consider what actually

distinguishes one request from the next, and see if you can devise a simple hash code algorithm from that. For

example, if the only difference between pages is in the SQL queries used to access a database, and if no new

data is inserted into the database, you could form a hash code based on the SQL queries themselves. If the

database is updated, but only occasionally, you might devise a scheme in which ETags are generated from SQL

queries plus a random identifier that is changed every time the database is modified. However, if the database

is written as frequently as it is read, you have to base the hash code on the data the queries return. Possibly,

though, you could make it depend on just some of the fields in the response rather than all of them.

For example, I have published a simple PHP program that generates a plain text file containing Fibonacci

numbers. The only relevant input is the number requested. Thus, I can make that the ETag, like so:

Search WWH ::

Custom Search

Home