Databases - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Data Category

Examples

Data Sources

Patient, Clinical Studies, Genomic Studies, Public Databases, Private Databases

Applications

Search Engines, Statistical Analysis, Visualization, Simulation, Communications,

Database Management System, Electronic Medical Record, Genomic

Databases

Public, Private, Taxonomy, Clinical, Genetic, Local, External, Archives

Data Formats

FASTA, PHYLIP, MAML, NEXUS, PAUP, FASTA+GAP, and MmCIF, Proprietary

Clinical Formats, Local Application Formats

Interfaces

Local Databases, Online Databases, Data Warehouse, Application

Integration Tools Data Dictionary, Network, Standards

Furthermore, many of the dozens of databases involved in pharmacogenomic research and

development use proprietary formats. This is especially true of clinical systems, many of which are

specialty-specific. For example, standard image formats for radiology databases include Digital

Imaging and Communications in Medicine (DICOM) and the American College of Radiology/National

Electrical Manufacturers Association (ACR/NEMA) standards. These standards were developed

primarily to facilitate multi-vendor connectivity to promote the development of Picture Archiving and

Communications Systems (PACS), but they have no provision for linking images with genomic

systems, such as gene expression databases.

The typical research laboratory must develop and maintain numerous interfaces between applications

and databases to provide the logical connectivity for data communications through the network

infrastructure. The simple network illustrated in Figure 2-2 glosses over the inner complexity of the

dozens of standards used through a typical information system, a problem at least partially

addressed by data dictionaries and conversion utilities. For example, few laboratories or medical

facilities provide the degree of connectivity suggested by this discussion. The vast majority of

hospitals in the U.S. use paper charts to record patient history and physical findings, for example.

Perhaps 5 percent of hospitals have a functional EMR, and most of these are partial implementations

that provide only summary information. Furthermore, these systems typically require researchers

and clinicians to learn several arcane languages and procedures to access all data that may be

relevant to a given patient. For example, clinicians may have to log in to a pathology system to check

urinalysis results, a radiology system to read the report on a patient's latest image studies, and an

admission, discharge, transfer (ADT) system to verify the patient's insurance provider. Similarly,

although many clinical studies are multimedia-rich, most radiology and pathology images, EKG

tracings, pulmonary function test curves, and other graphical materials are maintained in separate

databases that aren't connected to the main hospital or clinic network.

One approach to minimizing or hiding the complexity of the data-management process is to create a

single, integrated user interface. Just as the Windows or Macintosh operating systems hide the

complexity of computer operations from users, a unified user interface to a network of disparate

applications can hide the complexity of the data sources and various applications used to manipulate

the data. This unified user interface may take the form of a Web portal or the workstation's operating

system. For example, the flavors of UNIX for the PC, Macintosh, and dedicated UNIX workstations

each provide various views of local and networked applications. The challenge with hiding complexity

this way is that the constant changes in how data are actually managed in the background requires

parallel updating of the user interface that provides a front end to the system.

The data-management process is much more involved than simply sending data to a database and

retrieving it later. As discussed in the following sections, the databases used in bioinformatics

research presents a variety of challenges, many of which pertain to all phases of the data life cycle,

issues such as security, standards, interoperability, longevity of data, access and version control, the

use of encryption, and minimizing access time. The data life cycle and the relevant issues that arise

at each stage in the life of data are discussed in the rest of this chapter. Finally, issues that pertain to

Bioinformatics Computing

Search WWH ::

Custom Search

Home