Databases - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Commonly used commercial DBMS packages in bioinformatics include products from Microsoft,

Oracle, Sybase, IBM, MySQL AB, and InterSystems. In addition, there are dozens of proprietary and

academic systems developed for particular niche applications that many bioinformatics researchers

employ as well. Regardless of whether the technology is rooted in academia or business, virtually

every DBMS can be described using three levels of abstraction: the physical database, the conceptual

database, and the views. The point of using these abstractions is that they allow researchers to

manipulate huge amounts of data that may be associated in very complex ways by shielding

database designers and users from the underlying complexity of computer hardware. The physical

database is the low-level data and framework that is defined in terms of media, bits, and bytes. This

low-level abstraction is most useful for anyone who has to deal directly with data and files.

The conceptual database, at a somewhat higher level of abstraction than the physical database, is

concerned with the most appropriate way to represent the data. This level of abstraction more closely

approximates the needs of database designers who deal with DBMS data representation and

efficiency issues such as the data-dictionary design. The conceptual database is defined in terms of

data structures (an organizational scheme, such as a record) and the properties of the data to be

stored and manipulated. The most common methods of representing the conceptual database are the

entity-relationship model and the data model.

The entity-relationship model focuses on entities and their interrelationships in a way that parallels

how we categorize the world. For example, common database entities in bioinformatics are the

human being, protein sequences, nucleotide sequences, and disease processes about which data are

recorded. Similarly, every entity has some basic attribute, such as name, size, weight (a particular

protein may have a known weight), or charge. Relationships within the model are classified according

to how data are associated with each other, such as one-to-one, one-to-many, or many-to-many. For

example, a length of DNA may be translated to one mRNA sequence (a one-to-one relationship) and

a gene may give rise to several proteins (a one-to-many relationship). These and other relationships

can be used to maintain the integrity of data. For example, a gene (one entity) may generate more

than one protein, but the gene, having a one-to-one relationship with a nucleotide sequence,

shouldn't be associated with more than one nucleotide sequence. The data model can enforce this

one-to-one relationship.

The conceptual database can also be represented as a data model. Like entity-relationship models,

data models provide a means of representing and manipulating large amounts of data. A data model

consists of two components—a mathematical notation for expressing data and relationships, and

operations on the data that serve to express manipulations of the data. Like entity-relationship

models, data models may also contain a collection of integrity rules that define valid data

relationships. These various components work together to provide a formal means of representing

and manipulating data.

The most common data models supported by DBMS products are flat, network, hierarchical,

relational, object-oriented, and deductive data models, as illustrated graphically in Figure 2-17 . Even

though long strings of sequencing data lend themselves to a flat file representation, the relational

database model is by far the most popular in the commercial database industry and is found in

virtually every biotech R&D laboratory. However, virtually every data model illustrated in Figure 2-17

has applications in bioinformatics, from flat to semi-structured.

Figure 2-17. Data Models. The most common data models in bioinformatics

are relational, flat, and object-oriented.

Bioinformatics Computing

Search WWH ::

Custom Search

Home