Geography Reference
In-Depth Information
10.4
Integrating Ontologies with Gazetteers
In the previous section, we described how spatiotemporal extraction can reveal
the information contained in web documents, e.g., the evolution of storm events.
Ontologies can be used to provide additional semantic information about events.
To capture semantic information from the text documents, general domain-related
concepts (e.g., tornado, flood, landslide, and earthquake ) from upper level ontolo-
gies, such as Wo rd N e t ( http://wordnet.princeton.edu/ ) , DOLCE ( http://www.loa.istc.
cnr.it/DOLCE.html ) , and SUMO ( http://www.ontologyportal.org/ ) , can be adopted.
For this work, we use open source software NeOn, drawing from domain concepts
in OpenCyc ( http://sw.opencyc.org/ ) , DOLCE , Wo rd N e t , GeoNames ( http://www.
geonames.org/ontology/documentation.html ) , and Proton Ontology ( http://proton.
semanticweb.org/ ) to develop a hazard ontology. The ontology is organized as a
hierarchy specifying relations between classes (e.g., is_a relation). Each class has
one superclass and a set of subclasses that form the class hierarchy.
Once complete, the hazard ontology is imported into GATE as a language
resource, OWLIM ontology. A plugin tool named OntoGazetteer is used to link
terms from the gazetteers with classes in the ontology. Two steps are necessary
to create a connection between the gazetteers and ontology classes. First, domain-
related references in a set of training text documents are manually annotated and
stored as a linear list in a new gazetteer created for this task, the semantic gazetteer .
The training set is used for collecting domain-related references described in news
reports. Based on the results of processing the training set, an automatic mapping
between the gazetteers and the ontology is constructed. As a second step, rules are
made to link the newly created lists in the semantic gazetteer and the ontology.
The lists are treated as classes in the ontology. This step assigns references in the
gazetteer with terms in the ontology by reasoning over the hierarchical relations. For
example, assume the term supercell is identified in one of the training documents,
and that this term also exists in the ontology as a subclass of thunderstorm, and
thunderstorm, ice storm, dust storm, and blizzard are subclasses of storm. Supercell
therefore is_a storm . The spatial gazetteer includes a set of lists, such as lists of
cities, counties, states, and countries. Each list in the gazetteer contains a set of
spatial references, and these places are treated as instances of classes in the ontology.
With rules implemented in GATE, the semantics are processed, integrating the
gazetteers and the domain ontology. In this way, not only are spatial and temporal
terms extracted from text documents, but semantics of the spatiotemporal events
based on their relations are also captured. Supercell is associated with a geographic
location, and the semantics (e.g., thunderstorm or more generally, storm ) can be
automatically linked to this geographic location as well.
In the next section, articles describing tornadoes that impacted the central United
States in April 2012 serve as a case study to illustrate how spatiotemporal and
semantic information can be extracted automatically from web documents and
visualized for users.
Search WWH ::




Custom Search