Biomedical Engineering Reference
In-Depth Information
Commonly used commercial DBMS packages in bioinformatics include products from Microsoft,
Oracle, Sybase, IBM, MySQL AB, and InterSystems. In addition, there are dozens of proprietary and
academic systems developed for particular niche applications that many bioinformatics researchers
employ as well. Regardless of whether the technology is rooted in academia or business, virtually
every DBMS can be described using three levels of abstraction: the physical database, the conceptual
database, and the views. The point of using these abstractions is that they allow researchers to
manipulate huge amounts of data that may be associated in very complex ways by shielding
database designers and users from the underlying complexity of computer hardware. The physical
database is the low-level data and framework that is defined in terms of media, bits, and bytes. This
low-level abstraction is most useful for anyone who has to deal directly with data and files.
The conceptual database, at a somewhat higher level of abstraction than the physical database, is
concerned with the most appropriate way to represent the data. This level of abstraction more closely
approximates the needs of database designers who deal with DBMS data representation and
efficiency issues such as the data-dictionary design. The conceptual database is defined in terms of
data structures (an organizational scheme, such as a record) and the properties of the data to be
stored and manipulated. The most common methods of representing the conceptual database are the
entity-relationship model and the data model.
The entity-relationship model focuses on entities and their interrelationships in a way that parallels
how we categorize the world. For example, common database entities in bioinformatics are the
human being, protein sequences, nucleotide sequences, and disease processes about which data are
recorded. Similarly, every entity has some basic attribute, such as name, size, weight (a particular
protein may have a known weight), or charge. Relationships within the model are classified according
to how data are associated with each other, such as one-to-one, one-to-many, or many-to-many. For
example, a length of DNA may be translated to one mRNA sequence (a one-to-one relationship) and
a gene may give rise to several proteins (a one-to-many relationship). These and other relationships
can be used to maintain the integrity of data. For example, a gene (one entity) may generate more
than one protein, but the gene, having a one-to-one relationship with a nucleotide sequence,
shouldn't be associated with more than one nucleotide sequence. The data model can enforce this
one-to-one relationship.
The conceptual database can also be represented as a data model. Like entity-relationship models,
data models provide a means of representing and manipulating large amounts of data. A data model
consists of two components—a mathematical notation for expressing data and relationships, and
operations on the data that serve to express manipulations of the data. Like entity-relationship
models, data models may also contain a collection of integrity rules that define valid data
relationships. These various components work together to provide a formal means of representing
and manipulating data.
The most common data models supported by DBMS products are flat, network, hierarchical,
relational, object-oriented, and deductive data models, as illustrated graphically in Figure 2-17 . Even
though long strings of sequencing data lend themselves to a flat file representation, the relational
database model is by far the most popular in the commercial database industry and is found in
virtually every biotech R&D laboratory. However, virtually every data model illustrated in Figure 2-17
has applications in bioinformatics, from flat to semi-structured.
Figure 2-17. Data Models. The most common data models in bioinformatics
are relational, flat, and object-oriented.
Search WWH ::




Custom Search