Metadata and the Semantic Web - Web Standards: Mastering HTML5, CSS3, and XML

HTML and CSS Reference

In-Depth Information

•

Meta schemes specify a semantic framework defining the meaning of the key and its value

(prior to HTML5). They can also prevent potential ambiguity. Listing 7-3 shows an example.

Listing 7-3. A Meta Scheme

In this case, the meta scheme is Dublin Core (DC).

The language , keywords , description , and robots attributes contribute to more precise web searches by

defining document language, the most relevant keywords, and a short description. The value of the last attribute,

robots , provides control over search engine behavior for a limited extent [23]. Web pages can be prevented from

being indexed ( noindex ), crawled ( nofollow ), cached ( noarchive ), described ( nosnippet ), or described according to

the Open Directory Project ( noodp ) [24]. The combination of the noindex, nofollow values can be substituted by the

value none [25]. This setting can be used, for example, for confidential documents whose content and links should not

be indexed by search engines. 1 Web page descriptions retrieved from ODP used by Google, Yahoo!, and Bing can be

disallowed specifically. The meta name to be applied is Googlebot for Google, Slurp for Yahoo!, and msnbot for Bing

(Listing 7-4).

Listing 7-4. meta Tags for Different Crawlers

If you want to prevent the descriptions and titles retrieved from the Yahoo! Directory from being displayed in

search results, you can use the noydir value [26] (Listing 7-5).

Listing 7-5. Using the noydir Attribute Value

In spite of the variety of attribute values, using meta tags for preventing search engine indexing or crawling is not

the best solution. The robots.txt file should be used instead for this purpose.

The typical general metadata provided in the head section of web documents looks like Listing 7-6.

Listing 7-6. A Complete Example for meta Tags in XHTML5

harness, dog lead, dog kennel, dog bowl, dog coats" />

Since the attribute value of the name attribute on the meta element is robots , the value of the content attribute

( index, follow ) is applied to all search engines rather than a specific one.

1 There are other techniques to achieve similar results. For example, web documents contained by a directory that is disallowed in

robots.txt will usually be excluded from search results.

Search WWH ::

Custom Search

Home