Biomedical Engineering Reference
In-Depth Information
statistical perspective, is near sequences that tend to be found between introns and exons. However,
even with heuristics, user-directed discovery is inherently limited by the time required to manually
search for new data.
An alternative to manual searching—and one that has had considerable success in the travel,
banking, and telecommunications industries—is to use computer-mediated data mining, the process
of automatically extracting meaningful patterns from usually very large quantities of seemingly
unrelated data. Unlike human-directed exploration of databases, data mining can initiate queries that
aren't limited to the user's fluency in authoring effective database queries. This isn't to say that data
mining reduces the need for the researcher to establish a strategy or to evaluate the results of a data-
mining session. When used in conjunction with the appropriate visualization tools, data mining allows
the researcher to use her highly advanced pattern-recognition skills and knowledge of molecular
biology to determine which results warrant further study. For example, mining the millions of data
points from a series of microarray experiments might reveal several clusters of data, as visualized in
a 3D cluster display. The researcher could then select data belonging to one or more of the clusters
and use a variety of tools to determine the parameters that distinguish it from the other data.
Given the ever-increasing store of sequence and protein data from several worldwide genome
projects, data mining the sequences has become a major research focus in bioinformatics. This is in
part because molecular biologists can now conduct basic bioinformatics research from their desktop
workstation, without the overhead of establishing a wet lab. The aim of this chapter is to explore data-
mining techniques as an automated means of reducing the complexity of data in large bioinformatics
databases and of discovering meaningful, useful patterns and relationships in data. The " Methods "
section explores data mining from the perspective of the process of knowledge discovery.
"Technology Overview" reviews the underlying computer infrastructure and algorithms that make
data mining a practical endeavor. "Infrastructure" reviews the hardware and software requirements
of an efficient data-mining operation. "Pattern Recognition and Discovery" explores the basic
patternrecognition process and how it can be extended to pattern discovery.
The " Machine Learning " section reviews the numerous technologies that can be applied to support
data mining, from neural networks to Hidden Markov Models. "Text Mining" focuses on the
importance of mining the biomedical literature for data on functions to complement the sequence and
structure data mined from nucleotide and protein databases. The " Tools " section introduces some of
the practical general-purpose and bioinformaticsspecific tools available for data mining. The " On the
Horizon " section looks at the leading-edge data-mining technologies, especially real-time transaction
monitoring that promises to decrease the infrastructure requirements. The " Endnote " section
explores the long-term role of machine learning versus human-directed data-mining efforts.
Search WWH ::




Custom Search