HTML and CSS Reference
In-Depth Information
Extracting Semantic Content
Semantic content of web sites can be checked with the W3C Semantic Data Extractor [31]. It can extract semantic data
such as following:
•
Generic metadata
•
Title, author, and description provided in the document head
•
RDFa metadata embedded in the document body (also generated in RDF/XML)
•
Related resources
•
Linked files, for example, RSS or Atom news feeds
•
Glossary, copyright, and bookmarkable points provided in the document head
•
Outline of the document
•
Quotes and citations
Menu points and URIs are provided with hyperlinks.
Another comprehensive semantic data extractor tool is the Sindice Web Data Inspector at
http://inspector.
sindice.com
[32]. The tool can be used to extract RDF triples from markup, RDF/XML, Turtle, or N3 documents
provided either by URI or by direct input. Sindice Web Data Inspector can be used for retrieving semantic data
(Inspect button), combined semantic data extraction and validation (Inspect + Validate button), or ontology analysis
and reasoning (Figure
14-14
).
Figure 14-14.
Comprehensive options on the start screen of Sindice Web Data Inspector
Search WWH ::
Custom Search