HTML and CSS Reference
In-Depth Information
by a person. An example of a current search index is the Open Directory Project at
http://www.dmoz.org. It contains a hierarchy of topics and sites related to each topic.
In this project anyone can volunteer to be an editor and site reviewer. There is no cost
to submit your site to the Open Directory Project. An added benefit to being listed in
the Open Directory Project is that the database containing the approved sites is used by
a number of search engines, including Google, Ask.com, and AOL.
13.3 Components of a Search Engine
Search engines have the following components:
Robot
Database (also used by search directories)
Search form (also used by search directories)
Robot
A robot (sometimes called a spider or bot) is a program that automatically traverses the
hypertext structure of the Web by retrieving a Web page document and following the
hyperlinks on the page. It moves like a robot spider on the Web, accessing and docu-
menting Web pages. The robot categorizes the pages and stores information about the
Web site and the Web pages in a database. Various robots may work differently, but in
general, they access and may store the following sections of Web pages: title, meta tag
keywords, meta tag descriptions, and some of the text on the page (usually either the
first few sentences or the text contained in heading tags). Visit The Web Robots Pages
at http://www.robotstxt.org if you'd like more details about Web robots.
Database
A database is a collection of information organized so that its contents can easily be
accessed, managed, and updated. Database management systems (DBMSs) such as
Oracle, Microsoft SQL Server, or IBM DB2 are used to configure and manage the data-
base. The Web page that displays the results of your search has information from the
database accessed by the search engine site. According to http://www.bruceclay.com/
searchenginerelationshipchart.htm, some search engines receive portions of their content
from other search engines. For example, AOL Search receives its primary content from
Google.
Search Form
The search form is the component of a search engine that you are most familiar with.
You have probably used a search engine many times but haven't thought about what
goes on “under the hood.” The search form is the graphical user interface that allows a
user to type in a word or phrase to search for. It is usually simply a text box and a sub-
mit button. The visitor to the search engine types words (called keywords) related to his
or her search into the text box. When the form is submitted, the data typed into the
text box is sent to a server-side script that searches the database using the keywords
entered. The search results (also called a result set) is a list of information, such as the
URLs for Web pages, that meet your criteria. This result set is formatted with a link to
 
Search WWH ::




Custom Search