Biomedical Engineering Reference
In-Depth Information
vocabulary may be updated periodically, forcing whoever manages the database to expend the
resources necessary to re-index areas of the index affected by the updates. Failing to do so would
likely lead to user frustration, because users may not have the latest version of the vocabulary,
either because they aren't aware of the update or because they don't have access to an older version
of the vocabulary for reference.
For internal databases where the user population can be informed about changes in indexing, there is
much more flexibility in selecting or developing an indexing and search vocabulary. The most
common approaches to developing an in-house controlled vocabulary range from a totally
unconstrained ad-hoc system to creating a huge, potentially unwieldy combination of public
vocabularies. The ad-hoc approach of creating a new vocabulary as data are generated is reasonable
only if the vocabulary is relatively small and isn't expected to grow beyond 1,000 or 2,000 words. For
larger indexing tasks requiring the breadth of a published controlled vocabulary, a reasonable
approach is to modify a standard vocabulary, adding granularity in specific areas. This approach
takes advantage of an extensive vocabulary that may exceed 100,000 terms, but comes at a cost of
incompatibility with the published standard. The approach of combining standards is clearly the most
challenging because of the inevitable redundancies and internal inconsistencies of the vocabularies
used that must somehow be controlled. Whether or not the advantage of this approach—a vocabulary
that exceeds several hundred-thousand terms and is likely to cover the spectrum of indexing
needs—is worth the investment depends on the scope of the database project and the resources
available.
Regardless of whether a controlled vocabulary is designed from scratch or is based on a published
standard, the main technological issue is providing a means of using it consistently and without error.
For example, without rudimentary utilities such as text auto-completion, simply misspelling a search
term can render the sought-after data inaccessible.
Utilities
Many of the generic utilities originally intended to extend the functionality of browsers can be used to
facilitate searching molecular biology databases. These utilities include connection optimizers,
browser add-ons, personal firewalls, file-transfer programs, and download managers. Connection
optimizers are designed to improve Internet connection speed and reliability. Optimizers work by
allowing manual override of network communications configuration settings so that the connection
throughput can be optimized for sequence data (text strings), 3D protein structures (graphics), or
specific combination of data formats.
Browser extensions enhance browsers with features, such as automatic form-filling, supporting
searching within a document, dictionary tools that define or complete the spelling of words on-the-
fly, providing visual previews of Web pages before they are accessed, and adding buttons of
frequently accessed sites to the browser. Privacy and security utilities include personal firewalls that
take up where network firewalls leave off. They block advertisements, cookies, and other nuisances
that can interfere with the efficient use of a browser-based search engine.
Download managers are intended to accelerate searches by opening multiple connections to one or
more servers simultaneously, grabbing different parts of the file through each connection and
reassembling the file on the workstation. File-transfer managers add flexibility to standard FTP clients
by adding additional security through encryption, and by providing users with a graphical user
interface instead of a command-line prompt. Most of these utilities are available on Windows, Linux,
and UNIX environment platforms.
Search WWH ::




Custom Search