Biomedical Engineering Reference
In-Depth Information
interactions through the use of DNA neighbor base pair sequence property metrics. A
related approach we have taken in the Center for Intelligent Biomaterials and at AnVil, Inc.
is the application of high-dimensional visualization techniques integrated with machine
learning approaches to important problems possessing large multidimensional data sets.
This methodology is particularly well suited to the discovery of patterns and trends not
easily observed unaided within large complex data sets, where highly nonlinear effects
may exist.
Important discoveries have already been made using approaches where machine learn-
ing and informatics applied to large data sets has played a prominent role. One example
is a biosensor called TIGER from Isis Pharmaceuticals, Inc., which is reaching commer-
cialization. The development of this biosensor first involved discovering bacterial RNA
sequence motifs that are conserved to varying extents and that can reveal the presence of
different bacteria and different strains within a given species. In patient fluid samples, the
sequences to be detected are first amplified using the PCR reaction. Then, in a high-
throughput mode, the amplified samples are examined using high-resolution mass spec-
trometry. Subsequent informatics treatment of the data reveals characteristic sequence
motifs for different bacterial species and strains. Using this biosensor, these investigators
were able to detect three different species of pathogenic bacteria within the respiratory
fluid samples from a disease outbreak at a US military base (197). They identified a par-
ticularly virulent strain of Streptococcus pyrogenes as the primary cause of the outbreak in
all the disease samples analyzed. The TIGER biosensor has unique capabilities. Without
using bacterial culturing techniques, it is able to rapidly detect different bacterial species
and strains within species, as well as provide rapid identification of the source of infec-
tions, even in cases where the bacteria has not yet been characterized.
Another example where machine learning and informatics applied to large data sets has
played a prominent discovery role is in the problem of identifying important protein bio-
markers in complex biological fluids. In this instance, proteomic data from mass spec-
trometry of serum proteins has been used to accurately diagnose ovarian cancer (198). This
need for pattern discovery in complex data is the case in many examples found in biolog-
ical systems. For example, large numbers of proteins interact in complex interconnected
regulatory networks within all cells. This is the case for both small-molecule metabolic
pathways and complex biological gene control networks comprising DNA sequence sites
interacting with control proteins. These two critical cellular systems are just two examples
where informatics and data mining approaches are necessary, even critical to compre-
hending how they function. An understanding of how such systems operate will enable
their future practical use, where subsets of these biological systems may in fact be incor-
porated into ever more sophisticated biosensors. The need for machine learning tech-
niques coupled with novel ways of displaying the data may also be critical for the future
operation of complex biosensors. Analyses of a biosensor's measured complex nonlinear
input signals and the algorithmic learning of a biosensor's analyte class behavior may be
necessary to produce an accurate easily interpretable output for the end user.
Acknowledgments
Over the years, the author has worked with many creative faculty colleagues, postdoctoral
fellows, and graduate students on the projects described in this chapter. In particular, he
would like to single out for acknowledgement Prof. Susan Braunhut, Prof. Georges
Grinstein, Prof. Sukant Tripathy, and Prof. George Ruben for their stimulating and
Search WWH ::




Custom Search