Information Technology Reference
In-Depth Information
on WOM Scouter[ 9 ] and a bootstrapping method based on ONTOMO[ 10 ]. In
the plant information, the plant names are easily collected from a list on any
gardening web sites, and also we have already defined the property names from
the aspect of the plant cultivation. We thus need the value of the property for
each plant. The process of our LOD extraction is shown in Fig. 2 .
Fig. 2. Process of LOD content generation
We first create a keyword list, which consists of an instance name, that is,
plant name and a logical disjunction of the property names, such as “basil”
(“Japanese name” OR “English name” OR “country of origin” OR ...), and
then search on a web search engine, and receive more than 100 web pages. We
then retrieve the page contents, except for PDF files and also take a Google
PageRank value for each page.
The bootstrapping method extracts specific patterns of the document object
model (DOM) tree in the page contents using some keys, which are the property
names or their synonyms, and then applies the patterns to other web pages for
the extraction of other property values. The method is used for the extraction
from structured parts of the page contents like tables and lists.
There, however, are a number of gardening web sites, where most of the page
contents are described in plain text. We thus developed an extraction method
using dependency parsing, since a triple
< plantname, property, value >
is
regarded as a dependency relation
. The method follows
dependency relations in a sentence from a seed term, which is a plant name, a
property name, or their synonym, and then extracts a triple, or a triple without
a subject in the case of no subject within a sentence (the subject will be filled
with a plant name in the keyword list later).
Next, we select property values that match with co-occurrence strings which
are prepared for each property name, for example, the “temperature” property
< subject, verb, object >
Search WWH ::




Custom Search