HTML and CSS Reference
In-Depth Information
Disallow: /hidden
Then check your server logs to see which IPs have actually loaded that file. Also check to see what other files
those IPs have loaded. If it's just a few files, widely separated in time, I'd ignore it. But if I see that IP address
has been hitting every other page on my site, I'll block it in my Apache .htconfig file, like so:
<Directory "/www/xom">
Order allow,deny
Allow from all
Deny from 212.0.138.30
Deny from 83.149.74.179
Deny from 66.186.173.166
</Directory>
This prevents it from hitting any page on my site, not just the protected ones. However, chances are that spider
is up to no good, so I don't mind doing this.
You can also use mod_rewrite to block robots by User-agent. However, it's so easy to change the User-agent
string that I rarely bother with this. I find it hard to believe that a spider that's ignoring robots.txt is not going
to fake its User-agent string to look exactly like a perfectly legitimate copy of Firefox or Internet Explorer.
Some people have automated this procedure or used other means of detection. In particular, if anyone is hitting
your site more than 12 times per minute, he may well be up to no good. However, this requires quite a bit more
server intelligence than simple IP blocking.
Search WWH ::




Custom Search