HTML and CSS Reference
Extracting Semantic Content
Semantic content of web sites can be checked with the W3C Semantic Data Extractor . It can extract semantic data
such as following:
Title, author, and description provided in the document head
RDFa metadata embedded in the document body (also generated in RDF/XML)
Linked files, for example, RSS or Atom news feeds
Glossary, copyright, and bookmarkable points provided in the document head
Outline of the document
Quotes and citations
Menu points and URIs are provided with hyperlinks.
Another comprehensive semantic data extractor tool is the Sindice Web Data Inspector at http://inspector.
sindice.com . The tool can be used to extract RDF triples from markup, RDF/XML, Turtle, or N3 documents
provided either by URI or by direct input. Sindice Web Data Inspector can be used for retrieving semantic data
(Inspect button), combined semantic data extraction and validation (Inspect + Validate button), or ontology analysis
and reasoning (Figure 14-14 ).
Figure 14-14. Comprehensive options on the start screen of Sindice Web Data Inspector