HTML and CSS Reference
In-Depth Information
if a robot wants to visit a web page at the URL http://www.example.com/info/
about.html, it must irst check for the ile http://www.example.com/robots.txt.
Suppose the robot inds the ile, and it contains these statements:
User-agent: *
Disallow: /
he robot is done and will not index anything. he irst declaration,
User-agent: *, means the following directives apply to all robots. he second,
Disallow: /, tells the robot that it should not visit any pages on the site, either
in the document root or its subdirectories.
here are three important considerations when using robots.txt:
. Robots can ignore the ile. Bad robots that scan the Web for security
holes or harvest email address will pay it no attention.
. Robots cannot enter password-protected directories; only authorized
user agents can. It is not necessary to disallow robots from protected
directories.
. he robots.txt ile is a publicly readable ile. Anyone can see what
sections of your server you don't want robots to index.
he robots.txt ile is useful in several circumstances:
When a site is under development and doesn't have “real” content yet
.
When a directory or ile has duplicate or backup content
.
When a directory contains scripts, stylesheets, includes, templates, and
so on
.
When you don't want search engines to read your iles
.
favicon.ico
Microsot introduced the concept of a favorites icon. “Favorites” is Micro-
sot's word for bookmarks in Internet Explorer. A favorites icon, or “favicon”
for short, is a small square icon associated with a particular website or web
page. All modern browsers support favicons in one way or another by dis-
playing them in the browser's address bar, tab labels, and bookmark listings.
favicon.ico is the default ilename, but another name can be speciied in a link
element in the document's head section.
 
Search WWH ::




Custom Search