Biomedical Engineering Reference
In-Depth Information
sequence variants of DNA-protein complexes may eventually be predictable through sim-
ple calculations. This would be a valuable capability, since the laborious experimental gen-
eration of altered DNA sequences and testing of their DNA-protein stabilities and
properties would not need to be performed prior to their use in biosensors.
1.4
The Importance of Informatics and Data Mining Approaches in
Understanding Biological Macromolecules and in Biosensor Design and
Operation
The discipline of Computer Science provides ever more sophisticated and important soft-
ware applications to all areas of science. Increasingly important are informatics and data
mining approaches applied to the understanding of large complex data sets related to bio-
logical macromolecules' structures, functions, and intelligent properties. This understand-
ing is important both for the macromolecules' potential integration into biosensors as well
as in the analysis of a complex smart biosensors' input data, prior to providing a simple
output report to the user. While the analysis of complex biosensor input is important in
some instances, it represents a situation we did not encounter in our biosensor develop-
ment to date. Therefore, we do not discuss this in great detail. Rather, we illustrate the
importance of these data mining techniques by discussing two representative examples
where complex high-dimensionality biosensor data can only be properly addressed using
an informatics and data mining approach involving machine learning techniques.
1.4.1
Machine Learning Approaches
A number of the machine learning techniques that are sometimes used in the analysis of
biosensor output are also important in the analysis of large multidimensional, nonlinear
biological, biochemical, and chemical datasets. Such analyses can be used to understand
the intelligent properties of specific biological macromolecules as well as to incorporate
them into the design of biosensors. The use of informatics and data mining activities
applied to data sets in these domain areas have increasingly been termed, respectively,
bioinformatics and cheminformatics. The latter term has overlap with and has somewhat
superseded an older term, chemometrics, in the chemical literature.
Machine learning is the derivation of general knowledge from specific data sets using
statistical techniques and analytical algorithms. The data sets are searched using potential
hypotheses with the goal of building descriptive and predictive models general enough
that they may allow one to draw similar conclusions from other related data sets. Machine
learning can either involve classification techniques using supervised learning methods or
clustering techniques in the case of unsupervised learning methods. Supervised machine
learning methods can be utilized in the analyses of complex biosensor data where classes
of analytes are already known and are used to train the entire system of biosensor inputs.
Thus, machine learning methods could be used to recognize the biosensor input signal
attributes that most accurately identify the correct analyte from among many incorrect
possibilities. The resulting learned combination of input signal attributes that accurately
determine the analyte identity and concentration would then be applied to unknown
samples, to provide the routine biosensor output to the end user. Classification techniques
vary from simple testing of sample features for statistical significance to sophisticated
probabalistic modeling techniques. Some algorithmic methods widely used in machine
learning approaches include: Naive Bayes, neural networks, support vector machines,
instance-based learning ( K -nearest neighbor), and logistic regression, to name a few (173).
Search WWH ::




Custom Search