Information Technology Reference
In-Depth Information
model capture connections, but are generally inefficient to compute. Moreover, most
of the existing techniques do not exploit mode information which is usually present in
grid databases. Without mode information, keyword proximity techniques may have
difficulty in presenting results, and more importantly, they return many irrelevant
results. For example, the LCA (Lowest Common Ancestor) concepts for keyword
proximity search based on tree model may return the overwhelmingly large root of the
whole grid database.
With interested object classes, the most intuitive result of keyword proximity
search is a list of interested objects containing all keywords. We call these interested
objects as ICA (Interested Common Ancestor) in contrast to the well-known LCA
(Lowest Common Ancestor) concepts. Also, and more importantly, we propose IRA
(Interested Related Ancestors) concepts to capture conceptual connections between
interested objects and include more relevant results that do not contain all keywords.
An IRA result is a pair of objects that together contain all keywords and are connected
by conceptual connections. An object is an IRA object if it belongs to some IRA pair.
For example, for searching “grid searching processing”, the paper with title”searching
processing” and citing or cited by “grid” papers are considered as IRA objects. Fur-
ther, we propose RelevanceRank to rank IRA objects according to their relevance
scores to the searching. RelevanceRank is application dependent. For an intuitive
example, in DBLP, for searching “grid searching processing”, a “searching process-
ing” paper that cites or is cited by many “grid” papers is ranked higher than another
“searching processing” paper that cites or is cited by only one “grid” papers. Other
ranking metrics can also be incorporated with RelevanceRank. For example, for
searching“John Smith”, we can use proximity rank to rank papers with author ”John
Smith” higher than papers with co-author “John” and “Smith”.
Experimental evaluation shows our approach is superior to most existing academic
systems in terms of execution time and result quality. Our approach is also superior or
comparable to commercial systems such as Google Scholar and Microsoft Libra in
term of result quality.
6 Conclusion
One of the important areas in the organization of semistructured data is providing
algorithms that enable efficient searching of the data. Many researchers have investi-
gated matching linked patterns, using clever matching algorithms and included label-
ing schemes which enable smart ways of determining the relationships between nodes
in a tree, without traversing the tree.
In the future we will study how to use other concepts captured in ORASS mode
diagrams to further optimize the evaluation of linked pattern queries, provide guide-
lines of where these optimizations are worthwhile, and show the improvement in
processing speed through experimentation. The particular areas we will look at in-
clude how specific information in linked queries interact with optimization such as
parent child and ancestor-descendant relationships, negation, ordering of nodes, con-
stant values, and output nodes.
Search WWH ::




Custom Search