Developing Semantic Portals

INTRODUCTION

A semantic portal is a type of community information portal that exploits semantic Web standards (Berners-Lee, Hendler, & Lassila, 2001) to improve structure, extensibility, customization, and sustainability. They are similar to a traditional cyberspace portal, except that Web resources are indexed using a rich domain ontology (a specification of key domain concepts) as opposed to, for example, a list of keywords, and are based on new Web markup languages such as Resource Description Framework (RDF) (Manola & Miller, 2004) and Ontology Web Language (OWL) (McGuinness & Harmelen, 2004). RDF provides a flexible and extensible format for describing information items and associated metadata, while OWL supports explicit representation of the domain ontologies used to classify and structure the items. Together, these enable a more decentralized approach to portal architectures. This chapter discusses comprehensive, ontology-based approaches for building high-value semantic portals. State of the art development tools and techniques are first presented both from a client-side and server-side perspective. Next, widely used methodologies and tools for building ontologies are discussed. Finally, a tool called Ontoviews is demonstrated, which has been designed to assist semantic portal developers by providing accessibility to search and dynamic linking services.

CLIENT-SIDE DEVELOPMENT

The first stage in the information item life cycle in a semantic portal is the creation of information. An information item is generally created as a conceptual instance of an ontology class using an ontology based annotator such as Cohse1, OntoMat2, or Shoe Knowledge Annotator3. These applications allow the information provider to create RDF markups, and then associate the markup to a Web page. At this stage there is still no one standard method for associating RDF with HTML. Popular annotation methods include:

• Imbedding RDF in HTML: This involves placing the RDF markup somewhere that it can be readily extracted while not displayed by the browser. This may be done using the head tags or comment tags of the HTML document.

• Linking to external document: This is arguably the purest solution from an architectural point of view. The RDF annotations are stored on a separate RDF file somewhere on the Web. The original HTML source document then contains a <link> to the annotation. One drawback of this method is that maintaining the metadata externally to the RDF source document can be an inconvenience.

• Embed RDF as XHTML: This approach basically involves hacking up a small DTD (document type definition) using XHTML Modularization for a variant of XHTML, putting it on the Web, and then referencing it from the source document. The main drawback with this method is that the DTDs are large and relatively complex; this is not a viable approach for typical HTML authors.

The most commonly used approach to annotation, however, is to embed the markup in the head or comment tags of an HTML file, as shown in Figure 1. The information can then be extracted by a Web crawling application and mediated with the ontology schema.

SERVER-SIDE DEVELOPMENT

Semantic portals require a means to store information in an RDF enabled database, retrieve documents from the database, process RDF statements to infer knowledge, aggregate information from different sources, including other domains, and process RDF queries. Semantic middleware applications facilitate the above tasks by providing a platform with access to required functionality. Developers can access pre-existing modules for storing, retrieving, querying, and inferring knowledge, by interfacing with a middleware environment via an application programming interface (API). Table 1 is a list of some of the most popular middleware environments. For some time the leading framework has been Jena4.

The middleware environments in Table 1 provide access to a type of program called a reasoner. Reasoners can be employed to check cardinality constraints and class membership, or infer new knowledge from existing knowledge based on the semantics specified in an ontology. Examples of description logic reasoners are Racer Pro5, Pellet6, and Fact7. The environments also contain a query engine for processing RDF queries. Work on RDF query languages has been progressing for a number of years. Several different approaches have been tried, ranging from familiar looking SQL-style syntaxes, such as RDQL (Seaborne, 2004) and Squish (Miller, 2001), to path-based languages like Versa (Ogbuji, 2005). The SPARQL (Prud’hommeaux & Seaborne, 2005) query language is a current W3C working draft and protocol for accessing RDF data. SPARQL is expected to soon become a W3C recommendation. A W3C recommendation is generally considered by organizations to be an industry standard.

Figure 1. Annotated Web site

Table 1. Semantic middleware environments

Developer	Product	Category
Administrator http://www.aidmistotor.nl/	Sesame Spectacle	RDF(S) storage and retrieval, ontology-based information presentation
FZI – AIFB http://kaon.semanticweb.org/frontpage	KAON	Inference engine, knowledge management, and tools
HP Labs http://jena.sourceforge.net/	Jena	Inference engine, knowledge management, and tools
Intellidimension http://www.intellidimension.com/	RDF Gateway	RDF data management system
Kowari http://www.kowari.org/	Kowari Metastore	Metadata analysis and knowledge discovery, RDF storage
Ontoprise http://www/ontoprise.de/	Ontobroker	Inference middleware

Figure 2. Web search agent basic flow

The use of RDF and OWL tags in Web pages provides the opportunity for more advanced searching of Web content through the development semantically of enabled search engines. Major companies including Microsoft and Hewlett Packard have recently been investing in the development of a new breed of search engines called Web search agents. Web search agents crawl the Web searching for RDF and OWL documents, while at the same time providing an interface to the user. They facilitate user queries by determining then executing query a plan, and can be designed to initiate middleware application tasks. Web search agents are typically developed in a Java programming environment because of Java’s powerful server side programming capability, and the fact that most middleware applications listed in Table 1 can be readily interfaced with Java. Figure 2 shows the typical work flow functionality of a Web search agent.

ONTOLOGY DEVELOPMENT

Constructing an ontology is an important step in the development of semantic portals. There is no one correct method to model a domain as there are always visible alternatives. Most of the time the best solution depends on the application that the developer has in mind and the tools that the developer uses to develop the ontology (Cristani & Roberta, 2005, p. 66). In recent years a series of different methodologies designed to assist with carrying out development tasks have been reported in the literature. Classical methods include Cyc (Lenat & Guha, 1990), Uschold and King’s method (Us-chold & King, 1995), and Methontology (Fernandez-Lopez, Gomes-Perez, & Juritso, 1997). These methodologies provide common and structured guidelines, which, if followed, can fasten the development process and improve the quality of the end result. The Methontology framework supported by ontology engineering environment WebODE8 is the most famous design methodology. It is presented in Table 2. the end result. The Methontology framework supported by ontology engineering environment WebODE8 is the most famous design methodology. It is presented in Table 2.

Table 2. Methontology framework

Name of the Phase	Input	Description	Output
Planning	Nothing: first step	Plan the main tasks to be done, the way in which they will be arranged, the time and resources that are necessary to perform these tasks	A project plan
Specification	A series of questions such as: “Why is this ontology being built and what are its intended uses and end-users?”‘	Identify ontology goals	Ontology requirement specification document written in natural language, using a set of intermediate representations or using competency questions, respectively. The document has to provide at least the following information: the purpose of the ontology (including its intended users, scenarios of use, end users, etc.), the level of formality used to codify terms and meanings (highly informal, semi-informal, semi-formal, rigorously formal ontologies), the scope, and its characteristics and granularity. Properties of this document are: concision, partial completeness, coverage of terms, the stopover problem and level of granularity of ache and every term, and consistency of all terms and their meanings.
Conceptualization	A good specification document	Conceptualize in a model that describes the problem and its solution. To identify and gather all the useful and potential usable domain knowledge and its meanings	A complete glossary of terms (including concepts, instances, verbs, and properties). Then, a set of intermediate representations such as concepts, classification trees, verb diagram, table of formulas, and table of rules. The aim is to allow the final user to ascertain whether or not an ontology is useful and to compare the scope and completeness of several ontologies, their reusability, and share-ability.
Formalization	Conceptual model	Transform conceptual model into a formal or semi-compatible model, using frame-oriented or description logic representation systems	Formal conceptualization
Integration	Existing ontologies and the formal model	Processes of inclusion, polymorphic refinement, circular dependencies, and restriction. For example, select meta ontologies that better fit the conceptualization
Implementation	Formal model	Select target language	Create a computable ontology
Maintenance		Including, modifying definition in the ontology	Guidelines for maintaining ontologies
Acquisition		Searching and listing knowledge sources through nonstructured interviews with experts to have detailed information on concepts, terms, meanings, and so on.	A list of the sources of knowledge and a rough description of how the process will be carried out and what techniques will be used.
Evaluation	Computable ontology	Technical judgment with respect to a frame of reference	A formal and correct ontology
Documentation			Specification document must have the property of concision

Figure 3. Protege ontology editor

Table 3. Ontology development tools

Developer	Product	Availability	Language Support
FZI – AIFB http://kaon.semanticweb.org/frontpage	KAON 1.2.7	Open source	KAON RDF(S)
IMG (University of Manchester) http://oiled.man.ac.uk/index.shtml	OilEd 3.5	Open source	RDF(S) OIL DAML+OIL OWL
Ontoprise http://www.ontoprise.de/content/e3/e43/index eng.html	Ontostudio 1.4	Freeware Licenses	RDF(S) OWL F-Logic OXML
SMI (Stanford University) http://protege.stanford.edu/	Protege 3.2	Open source	XML RDF(S) XML Schema OWL
KMI (Open University) http://kmi.open.ac.uk/projects/webonto/	WebOnto 2.3	Free access	OCML RDF(S)
Mindswap http://www.mindswap.org/2004/SWOOP/	Swoop 2.3	Open Source	RDF(S) OWL

A number of development and editing tools are also available to help ease the complex and time consuming task of building ontologies. Tools such as OilEd9, Ontostudio10, and Protege11 provide interfaces that help users carry out some of the main activities of the ontology development process. One of the oldest and most widely used tools is Protege. Protege (which now supports OWL) allows the user to define and edit ontology classes, properties, relationships, and instances using a tree structure. Ontologies can be exported into a variety of formats, including OWL, RDF(S), and XML Schema. Table 3 lists some of the ontology editing tools available today.

DESIGN ISSUES

The development of a semantic portal of a directory of UK environmental organizations as documented by Reynolds, Shabajee, and Cayzer (2004), revealed that the design of such portals throws up the following challenges:

• Moderation and access control: Decentralized portal design enables an interesting security model. In Reynolds’ test implementation, the aggregator will have a record of which source URL’s are deemed to be authoritative for a given organization. Each organization can then impose its own access and validation rules governing the update of that data. Some central administration is needed to moderate this “white list” of acceptable information sources. A semantic Web crawler approach which supports dynamic addition of new sources is one possibility, but does not in itself address the problems of discovering “unsuitable” material.

• Navigation: The rich classification of portal items is only useful if the interface complexity is kept under control. Current experience suggests that a faceted browse approach modeled after the Flamenco project12 offers a good balance between expressiveness and simplicity.

• Provenance: The ability to mix community extensions and annotations with an organization’s own data is a powerful feature of the approach. However, it is important that when a user is navigating the site they are able to clearly separate authoritative data from third party data, and in the latter case find where it came from in order to decide how much to trust it. This raises design issues for efficient recording of provenance and trust model issues (delegation and so forth), but also user interface issues of how to make the provenance of items clear.

• Open-ended data model: Reynolds wishes to support the open-ended nature of the RDF data model so that new properties and classes (whether authoritative or third party) can be incrementally added. The visualization engine, though, needs to adapt to such changes without requiring new rendering templates to be created at each stage.

ONTOVIEWS: A TOOL FOR CREATING SEMANTIC PORTALS

This section is a summary of a tool for creating semantic portals called Ontoviews (M”akel”a et al., Hyvonen, Saarela, & Viljanen, 2004). Ontoviews assists with semantic portal development by providing developers with two important services; (i) a search engine based on the semantic of content, and (ii) dynamic linking between pages based on semantic relations contained in the underlying knowledge base.

The Ontoviews architecture consists of three main components:

• Prolog-based logic server (Ontodella): Provides the system with reasoning services such as category generation and semantic recommendations.

• Java-based multifacet search engine (Ontogator):

Defines and implements an RDF-based query interface that separates view-based search logic from the user interface. The interface is defined as an OWL ontology and can be used to query for category hierarchies of the ontology. It also facilitates keyword-based searches.

• User interface (OntoViews-C): Binds the previous two components together and is responsible for the user interfaces and interaction.

Figure 4. Ontoviews architecture

The Ontoviews search engine presents the end user with concepts for navigation in a hierarchical structure. The concepts known as categories are linked via semantic relations contained in the ontology. Figure 5 shows a sample query from the Museum of Finland semantic portal13 which was built using Ontoviews. With Museum ofFinland, the content consists of collections of cultural artifacts and historical sites consolidated from several heterogeneous Finnish museum databases, annotated in RDF format using seven different ontologies. In the example in Figure 5, a search for “esp” matches the categories Spain (“Espanja” in Finish), and a list of semantically related categories are then displayed as hyperlinks. Searches may also be performed by navigating the hyperlinks alone without using keywords.

A developer may use Ontoviews to create a semantic portal by setting up the components on a server, and then adapting the system to their own data. This adoption requires a number of configuration steps. Rules describing how categories are generated and items connected to them for the view-based search must first be created. The next step is to create rules describing how links are generated for the recommendations. The last step involves changing the layout of visual templates to suit the developer’s needs.

In summary, Ontoviews can greatly assist with the creation of semantic portals by facilitating some of the key requirements of such systems. The concept based multifacet search engine exploits the semantic relations in the underlying knowledge base, providing the end user with a classification tree view containing semantic links. It offers different user interfaces functionality for different devices and is adaptable to a wide variety of semantic data.

CONCLUSION

The chapter has presented state of the art tools and techniques used for the development of a new bread of Web portals known as semantic portals. These types of portals are based on semantic Web standards, and use a rich domain ontology to index portal content. Semantic portals offer many advantages over traditional portals. Advantages include the capability for knowledge to be inferred about portal information through the clever use of semantics built into the domain ontology, as well as the decentralized nature of semantic Web technologies, which contributes to more efficient portal maintenance. A commonly used method for annotating portal documents with RDF metadata is to imbed the annotations between the head or comment tags of an HTML Web page. The Jena middleware environment is commonly used by developers in conjunction with Web search agents to facilitate information storage, inference, and query functions. A number of ontology development methodologies also exist to assist with the complex task of building ontologies. The Methontology framework, which is the most famous of these, was presented in detail. The Protege application is a widely used tool for constructing ontologies. It is extremely popular with ontology developers because of its support for OWL and tree-like navigation structure, which allows for easy editing. Previous semantic portal development initiatives have been shown to encounter design issues that still need addressing, such as: moderation and access control, navigation, and provenance, as well as the problems associated with having an open-ended data model. On the positive side, tools such as Ontoviews are emerging that provide developers with important services to help reduce the complexity of many development tasks. With the current evolution toward a semantic Web, semantic portal development is likely to be a growing field in future years. With further research, development techniques and applications should steadily improve, making the task of building semantic portals much easier than it is today.

Figure 5. Sample query

KEY TERMS

Ontology: Shared and formal description of key concepts in a given domain.

Reasoner: Application capable of processing a static ontology model and inferring new facts based on semantics specified in the ontology.

Semantic Middleware: Programming environment that allows developers to interface within order to carry out various information processing tasks such as ontology storage, reasoning, querying, and so forth.

Semantic Portal: Web portal based on semantic Web technologies.

Semantic Web: An extension of the current Web where information if given a precise meaning enabling intelligent applications to process information more effectively.

Semantics: The implied meaning of data. Used to define what entities mean with respect to their roles in a system.

Web Search Agent: Web-based application with the ability to act autonomously and perform complex search tasks for the end user.