Biomedical Engineering Reference
In-Depth Information
Today, most of the potential links between data in digital form aren't readily available because the
relevant data, when they exist, are in disparate databases. In addition, each database is typically
based on different and incompatible database technologies and uses different languages and
vocabularies to access data. These incompatibilities are especially significant when non-textual data,
such as 3D images of protein structures, accessed by author-specified keywords, need to be linked
with nucleotide sequences in other databases. Because each database is typically created as a stand-
alone application to support one function, linking between databases is most often an afterthought.
Although static links between databases can be established programmatically, a more common
approach is to create links dynamically by using search engines. In addition, even when static links
are established between databases, extracting meaningful content from these linked databases
invariably involves using a search engine of some sort.
As anyone who has surfed the Internet has discovered, a search isn't necessarily successful, and may
turn up nothing or thousands of irrelevant links. Thus, the relevance of the dynamic database links
created by interacting with a typical Web-accessed search engine is primarily a function of the search
engine's selectivity and sensitivity, the ingenuity and knowledge of the search engine user, and the
availability of relevant content. In addition, the amount of irrelevant content and its similarity with
the desired content, together with the peculiarities of database design, limit the ease of finding the
sought-after data.
The exponentially increasing amounts of data accessible over the Internet, from gene sequences and
clinical disease findings to related issues in other fields, is primarily accessible through search engine
technologies. As such, this chapter explores the status of search engine technology, focusing on
bioinformatics resources, within the context of the overall knowledge management of online data.
" The Search Process " section of this chapter introduces many of the challenges and concepts involved
in a typical search of molecular biology databases accessible through the Internet, based on the
Entrez integrated searching environment. "Search Engine Technology" explores the various
technologies that researchers can use to differentiate required data from the noise, from portals and
intelligent agents, to natural-language processing (NLP) and other user interface tools. In particular,
Search WWH ::




Custom Search