Databases - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

used for program execution, whereas an expansive disk or other non-

volatile memory serves as a container for data that can't fit in working

memory.

Volatility, working memory, and the volume of data that can be handled are key variables in memory

systems such as databases. In addition, there is the quality of interrelatedness; just as the genes in

the chromosomes are associated with each other by virtue of their physical proximity, the data in a

database are interrelated in a way that facilitates use for specific applications. For example,

nucleotide sequences that will be used in pattern-matching operations in the online sequence

databases will be formatted according to the same standard—such as the FASTA standard.

As reflected in the data life-cycle model discussed earlier, the data-archiving process involves

indexing, selecting the appropriate software to manage the archive, and type of media as a function

of frequency of use and expected useful life span of the data. From an implementation perspective,

the key issues in selecting one particular archiving technology over another depends on the size of

the archive, the types of data and data sources to be archived, the intended use, and any existing or

legacy archiving systems involved. For example, the size of the archive is measured in terms of the

number of items and the space requirements per item. Text-only archives of nucleotide or amino acid

sequences generally require less space per item than archives of 3D images of protein molecules and

other multimedia. Not only are space requirements generally much greater for multimedia data than

they are for text, but images usually require additional keywords and text associated with them so

that they can be readily located in an archive.

Similarly, a single source of data is generally much easier to work with than data from multiple,

disparate sources in different and often non-compatible formats. In addition, hardware and software

used in the archiving process should reflect the intended use of the data. For example, seldom-used

data can be archived using a much less powerful system, compared to data that must be accessed

frequently. Finally, it's rare to have the opportunity to initiate a digital archiving program from

scratch. Normally, there is some form of existing (legacy) system in place whose data has to be

converted to be suitable for archiving.

The simplest approach to managing bioinformatics data in a small laboratory is to establish a file

server that is regularly backed up to a secure archive. To use the hardware most effectively,

everyone connected to the server copies their files from their local hard drive to specific areas on the

server's hard drive on a daily basis. The data on the server are in turn archived to magnetic tape or

other high-capacity media by someone assigned to the task. In this way, researchers can copy the

file from the server to their local hard drive as needed. Similarly, if the server hardware fails for some

reason, then the archive can be used to reconstitute the data on a second server.

As noted earlier, from a database perspective, file servers used as archives have several limitations.

For example, because the data may be created using different applications, perhaps using different

formats and operating systems, searching through the data may be difficult, especially from a single

interface other than with the search function that is part of the computer's operating system. Even

Search WWH ::

Custom Search

Home