Databases - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Database Technology

The purpose of a database is to facilitate the management of data, a process that depends on people,

processes, and as described here, the enabling technology. Consider that the thousands of base pairs

discovered every minute by the sequencing machines in public and private laboratories would be

practically impossible to record, archive, and either publish or sell to other researchers without

computer databases. At the current stage of database technology evolution, bioinformatics databases

are housed on large hard drives in locker- or refrigerator-sized local servers and online sequence

databases such as GenBank. Thanks to modern computer technology, a modern bioinformatics

researcher can compare and contrast the genomes of a dozen species while sitting on the beach with

a laptop computer connected through a wireless modem to the Internet. While this image makes for

good advertising copy, in practice, most researchers are tied to wet laboratories that generate,

manipulate, and store vast quantities of experiment-specific data. In this context, the database

technology empowers researchers to store their data in a way that it can be quickly and easily

accessed, manipulated, compared to other data, and shared with other researchers.

The concept of a database is necessarily colored by the current state of the technology. Just as a

state-of-the-art bioinformatics workstation, operating at Gigahertz clock speeds with a gigabyte or

more of RAM and banks of hundred-gigabyte hard drives, would easily outperform one of the early

supercomputers, database technology is constantly evolving. Within our lifetimes, the contents of

GenBank will easily fit into the working memory of a handheld computer, and our concept of what

constitutes a "large" database will have to be adjusted accordingly. Even so, there is more to the

concept of a database—whether it's referred to as a repository, data warehouse, data mart, or local

database—than raw capacity.

The volatility of the data, the concept of working memory, and the interrelatedness of data,

regardless of the volume of data involved, are distinguishing features of the various forms of memory

systems or databases. For example, from the perspective of working memory, the function of a data

warehouse is to move data from a variety of sources and prepare the data for incorporation into

working memory. Similarly, a data warehouse or other database is distinguished from an archive in

that the data in an archive are much further removed from working memory. An archive might be

stored on optical platters, magnetic tapes, or other media that is held in an offsite fireproof safe or

underground building. Furthermore, the archive is typically engineered for longevity and the ability to

be reconstituted, and not for speed of access. A database, in contrast, is a live, working system that

forms the centerpiece for biotech R&D activities.

Functionally, the relationship between various database technologies can be compared to the

information stored in the body, as depicted in Figure 2-11 . Just as it's inefficient to have papers

strewn about an office, out of order, difficult to identify, and distracting the user's attention from the

documents that should be addressed, our genetic information is stored in the genome, tightly

packed, out of harm's way, and yet accessible. The data are there, as in an archive, but not

immediately available. Focusing on the individual chromosomes, data are more readily available, but

still packed away so that they don't interfere with cellular processes. As subsets of data are moved

out of the chromosome to the work environment, through the process of transcription, data are more

readily available for use. Finally, at the translation stage, the data serve as the basis for the current

work (as data do for computer applications), whether creating proteins according to the Central

Dogma, or attempting to locate a matching gene in a pattern-matching application.

Figure 2-11. Organic Analog of Database Hierarchy. The database hierarchy

has many parallels to the hierarchy in the human genome. Data stored in

chromosomes, like a data archive, must be unpacked and transferred to a

more immediately useful form before the data can be put to use.

Bioinformatics Computing

Search WWH ::

Custom Search

Home