Using Concepts in Grid Data Organization - Advanced Research on Computer Education, Simulation and Modeling

Information Technology Reference

In-Depth Information

model capture connections, but are generally inefficient to compute. Moreover, most

of the existing techniques do not exploit mode information which is usually present in

grid databases. Without mode information, keyword proximity techniques may have

difficulty in presenting results, and more importantly, they return many irrelevant

results. For example, the LCA (Lowest Common Ancestor) concepts for keyword

proximity search based on tree model may return the overwhelmingly large root of the

whole grid database.

With interested object classes, the most intuitive result of keyword proximity

search is a list of interested objects containing all keywords. We call these interested

objects as ICA (Interested Common Ancestor) in contrast to the well-known LCA

(Lowest Common Ancestor) concepts. Also, and more importantly, we propose IRA

(Interested Related Ancestors) concepts to capture conceptual connections between

interested objects and include more relevant results that do not contain all keywords.

An IRA result is a pair of objects that together contain all keywords and are connected

by conceptual connections. An object is an IRA object if it belongs to some IRA pair.

For example, for searching “grid searching processing”, the paper with title”searching

processing” and citing or cited by “grid” papers are considered as IRA objects. Fur-

ther, we propose RelevanceRank to rank IRA objects according to their relevance

scores to the searching. RelevanceRank is application dependent. For an intuitive

example, in DBLP, for searching “grid searching processing”, a “searching process-

ing” paper that cites or is cited by many “grid” papers is ranked higher than another

“searching processing” paper that cites or is cited by only one “grid” papers. Other

ranking metrics can also be incorporated with RelevanceRank. For example, for

searching“John Smith”, we can use proximity rank to rank papers with author ”John

Smith” higher than papers with co-author “John” and “Smith”.

Experimental evaluation shows our approach is superior to most existing academic

systems in terms of execution time and result quality. Our approach is also superior or

comparable to commercial systems such as Google Scholar and Microsoft Libra in

term of result quality.

6 Conclusion

One of the important areas in the organization of semistructured data is providing

algorithms that enable efficient searching of the data. Many researchers have investi-

gated matching linked patterns, using clever matching algorithms and included label-

ing schemes which enable smart ways of determining the relationships between nodes

in a tree, without traversing the tree.

In the future we will study how to use other concepts captured in ORASS mode

diagrams to further optimize the evaluation of linked pattern queries, provide guide-

lines of where these optimizations are worthwhile, and show the improvement in

processing speed through experimentation. The particular areas we will look at in-

clude how specific information in linked queries interact with optimization such as

parent child and ancestor-descendant relationships, negation, ordering of nodes, con-

stant values, and output nodes.

Search WWH ::

Custom Search

Home