Modeling and Simulation - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

to be used to predict protein structure from sequence data, then an underlying assumption is that the

data on amino acid sequence, bond length, bond angles, and related atomic-level data are not only

available, but the data are accurate to some verifiable level.

Given the underlying data and a conceptual model, the next phase of the modeling and simulation

process is translating the conceptual model into data structures and high-level descriptions of

computational procedures. Designing the computer model involves extracting from the conceptual

model only those characteristics of the original system that are deemed essential, as determined by

the model's ultimate purpose. For example, the purpose of predicting protein structure from

sequence data may be to allow the end-user to visualize the protein structure, so that a high degree

of accuracy isn't that essential. In this example, the purpose of the model is to simplify and idealize,

and the characteristics selected from the conceptual model should reflect this purpose.

Designing the computer model, like defining the problem space and conceptual modeling, is largely

an art. Designing a simple model that adequately mimics the behavior of the system or process

under study is a creative process that incorporates certain assumptions. The art of making good

assumptions may well be the most challenging component of modeling, considering success depends

as much on the domain experience of the modeler as it does on the nature of the system to be

modeled. Biological systems are seldom presented in a quantitative manner, often requiring that the

model designer derive or invent the needed mathematical formalisms or heuristics.

Coding of the computer model involves transferring the symbolic representations of the system into

executable computer code. Model coding marks the transition of the modeling process from an

artistic endeavor to a predominantly scientific one, defined by software engineering principles. Model

coding may involve working with a low-level computer language, such as C++, or a high-level shell

designed specifically for modeling and simulation.

Once a model is in the form of executable code, it should be subject to verification and validation.

Verification is the process of determining that the model coded in software accurately reflects the

conceptual model by testing the internal logic of a model to confirm that it is functioning as intended,

for example. The simulation system and its underlying model are validated by assessing whether the

operation of the software model is consistent with the real world, usually through comparison with

data from the system being simulated. For example, in a system designed to predict protein

structure, the validation process would include comparing model data with protein structure data

from NMR and X-ray crystallography. Validating X-ray crystallography data might involve comparing

it with the pattern resulting from bombarding the crystal lattice of a purified protein with X-rays. In

contrast, validating NMR data might involve comparing it with actual data produced by scanning a

pure protein in solution.

Validation also involves certifying that the output of the system as a whole is adequate for the

intended purpose and is consistent with the presumptions of expert opinion. As such, validation is at

least in part a subjective call. The validity of a model is a function of the objectives of the model

designer and the context of its application. For example, the usefulness of a model of protein

structure for a decision-making application is a function of the accuracy of prediction. There are no

concepts such as "best" or "correct" in model validity assessment, considering that the degree to

which a model needs to reflect or mimic a real-world system varies with each case. In addition,

because verification is a check for internal consistency, it's possible for a model to be verifiable and

yet fail validation because of errors in the conceptual model.

Executing the simulation ideally generates the output data that can illustrate or answer the problem

initially identified in the problem space. Depending on the methods used, the amount of process and

time required to generate the needed data may be extensive. For example, predicting protein

structure using ab initio methods can involve thousands of iterations and take days of supercomputer

time in order to arrive at statistically reliable results.

Visualizing the output data opens the simulator output to human inspection, especially if the output is

in the form of 3D graphics that can be assessed qualitatively instead of in tables of textual data. For

example, even though the structure of a protein may be described completely in a text file that

follows the PDB format, the data take on more meaning when they can be visualized as a 3D

Bioinformatics Computing

Search WWH ::

Custom Search

Home