Biomedical Engineering Reference
In-Depth Information
Search Engines and Knowledge Management
The ability to search through a molecular biology database assumes that an effective knowledge
management process is in place. Using the DNA sequencing process as an example, consider the
steps involved in making sequence data available to a researcher through a search engine. First,
there is the lengthy process of acquiring the data from a sequence machine. This involves identifying
a set of clones that span a region of the genome to be sequenced, making sets of smaller clones from
mapped clones, purifying DNA from the smaller clones, and finally setting up and performing the
sequencing using gel electrophoresis. Then there is the verification and annotation of the sequence
data. Annotation is especially critical, because it enables the sequence data to be accessed by name
and linked to other databases. In this way, researchers in other labs and in other fields can access
the sequence data. A newly discovered nucleotide sequence might be linked to (and linked from) a
protein database, an inherited disease database, and perhaps a drug interaction database, for
example. Ultimately, providing name and linking hooks to the new data facilitates discovery of
associations or links between different but related fields in a way that extends our knowledge.
As involved as this initial stage of knowledge management can be, it's a waste of time and resources
without a comprehensive knowledge management program. This includes a defined means of
transforming data for other purposes, such as using the data in a tightly linked secondary database
of clinical disease. It also includes archiving data so that they can be recovered in the event of failure
in the primary database system, and providing the infrastructure capable of tracking the location of
particular data elements and of controlling access to the data.
Although every component of the knowledge management process is critical, the data that are
managed are of little value unless they can be easily accessed in a timely manner. From a practical
perspective, knowledge management should support the retrieval of data from an online database
with a search engine while making provision for security through user authentication or other
methods. As such, factors that affect usability include the quality and appropriateness of the user
interface, the vocabulary used to index and retrieve data, ease of use, ease of learning, and the time
required for specific data to be searched for and retrieved define the value of the system.
As described earlier, using one of the integrated database systems such as Entrez, SRS, or BioKRIS
can significantly reduce the time and difficulty associated with performing a successful search.
Although having databases online facilitates link integration through the search process, the interface
challenges begin at the time databases are first defined. The issue with creating databases of any
type is that they are necessarily defined for a particular use. For example, the HomoloGene online
database is optimized to manage putative homologies among the human, mouse, rat, and zebra fish
genomes, whereas SWISS-PROT is optimized to locate protein sequence data. Moving outside of the
molecular biology arena, the online professional databases including LexisNexis, Dialog, and Ingenta
each provide comprehensive, efficient access to information in their domains. Similarly, PAC provides
integration of life-science journal literature in a common format and in a single repository, providing
a single, unified access portal to scientific literature instead of a combination of links to disparate
databases, each with their own idiosyncrasies in vocabularies and infrastructures.
Information technology challenges aside, there is a limit to how far systems like Entrez can be further
refined, because of our incomplete understanding of how a database can and should be linked. For
example, molecular biology has yet to fully explain how single genes can code for multiple proteins or
how all of the proteins in the human proteome interact with each other and the cellular environment
under various conditions. That said, the future of bioinformatics lies clearly in the integration of
disparate databases in molecular biology as well as with those in other fields to provide a unified view
of life.
As an illustration of the degree of linking that will eventually be needed to even approximate this
unified view, consider the experiences—which can be represented by links—typical of physician
training in the United States. As listed in Table 4-4 , the traditional pre-medical curriculum includes
the basic sciences, including chemistry, physics, and genetics. Medical school provides exposure to
pre-clinical studies such as physiology and anatomy, followed by clinical exposure to everything from
 
 
Search WWH ::




Custom Search