Information Technology Reference
In-Depth Information
at extracting large collections of facts (e.g., names of scientists or politicians) from the
Web in an unsupervised, domain-independent, and scalable manner. They also argue
for light-weight NLP technologies and follow a similar approach to chunk extraction
as we do (but not a chunk-pair-distance statistics). Although we do not yet explicitly
extract relations in the sense of standard relation extraction, our topic graph extraction
process together with the clustering mechanism can be extended to also support relation
extraction, which will be a focus of our next research.
8
Conclusions and Outlook
We presented an approach of interactive topic graph extraction for exploration of web
content. The initial information request is issued online by a user to the system in the
form of a query topic description. The topic query is used for constructing an initial
topic graph from a set of web snippets returned by a standard search engine. At this
point, the topic graph already displays a graph of strongly correlated relevant enti-
ties and terms. The user can then request further detailed information through multiple
iterations.
A prototype of the system has been realized on the basis of two specialized mobile
touchable user interfaces for operation on an iPad and on an iPhone which receive both
the same topic graph data structure as input. We believe that our approach of interactive
topic graph extraction and exploration, together with its implementation on a mobile
device, helps users explore and find new interesting information on topics about which
they have only a vague idea or even no idea at all.
Our next future work will consider the integration of open shared knowledge bases
into the learn search activity, e.g., Wikipedia or other similar open web knowledge
sources and the extraction of relations, and finally to merge information from these
different resources. We already have embedded Wikipedia's infoboxes as background
knowledge but not yet integrated them into the extracted web topic graphs, cf. [12] for
some more details. If so done, we will investigate the role of Wikipedia and the like as
a basis for performing disambiguation of the topic graphs. For example, currently, we
cannot distinguish the associated topics extracted for a query like “Jim Clark” whether
they are about the famous formula one racer or the Netscape founder or even about
another person.
In this context, the extraction of semantic relations will be important. Currently, the
extracted topic pairs only express certain semantic relatedness, but the nature and mean-
ing of the underlying relationship is unclear. We have begun investigating this problem
by extending our approach of chunk-pair-distance extraction to the extraction of triples
of chunks with already promising initial results.
Acknowledgements. The presented work was partially supported by grants from the
German Federal Ministry of Economics and Technology (BMWi) to the DFKI THE-
SEUS project (FKZ: 01MQ07016).
 
Search WWH ::




Custom Search