Databases - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Indexing

Indexing methodology, including selection and use of the most

appropriate controlled vocabulary

Integration

Integration with other databases

Intellectual Property

Ownership of sequence data, images, and other data stored in and

communicated through the database

Interfaces

Connectivity with other databases and applications

Legacy Systems

How to deal with legacy data and databases

Licensing

For vendor-supplied database systems, the most appropriate licensing

arrangement

Life Span

The MTBF for the hardware as well as the likely useful life of the data

Load Testing

The maximum number of simultaneous users that can be supported by

the DBMS

Maintenance

Cost and resource requirements

Media

The most appropriate disk, tape cartridges, and CD-ROM media

Normalization

Avoiding errors by representing data one way, one time, and in one

place

Operating Environment

Ensuring proper power and operating temperature and humidity

Operating System

UNIX, Linux, Windows, MacOS, or mini/mainframe OS

Output

Format of database output

Performance

Access time and data throughput

Privacy

Provision for preserving confidentiality of data

Query Language

Proprietary or standard query language

Redundancy

Hot backups, shadowing, and RAID systems

Resource Requirements

Hardware, software, and operating and development personnel

Scalability

Ability to handle greater data volume with added hardware and/or

software upgrades

Security

Limits on user access, from username-password combinations to

biometrics, as well as encryption of sessions

Stand-Alone vs. Network

And multi- vs. single user

Standards

From media format to operating system, query language, and data

models

Utilities

Availability of software tools for data recovery

Vendor Viability

Commercial viability of the hardware and software vendors supplying

database tools and platform

For example, a milestone in designing and implementing a database is defining the type of data to be

stored. This decision will then imply the most appropriate data model and type of DBMS to employ. If

the data are nucleotide sequences, then a reasonable choice would be a semi-structured database

based on XML-tagged text files. However, if the data are images of 3D protein structures and

keywords, then either an object-oriented or a relational database would likely be more appropriate.

Even though the representation of rows and columns may not be optimum for mapping protein

structures onto a database, factors such as support from a commercial relational database vendor

and support might dictate use of a relational product.

Consider the process involved in creating a central data warehouse of a scale appropriate for the

Bioinformatics Computing

Search WWH ::

Custom Search

Home