Biology Reference
In-Depth Information
9 CONCLUSIONS
The field of data mining encompasses a huge range of techniques, addressing a very
wide range of problems. Standard statistical approaches have much to offer, if the
data are appropriate and the algorithms computationally feasible. With molecular
microbiological data, however, this is not always the case. Many high-throughput
datasets are large, complex and noisy, with unknown degrees of error. Despite these
drawbacks, there are many CI-based algorithms that can be used to deduce new infor-
mation from these datasets. Data mining algorithms must, however, always be used
with caution, and their results scrutinised carefully by knowledgeable experts. In
many cases, the most appropriate use of data mining is to inspire new, testable
hypotheses based upon previously unseen patterns in datasets.
The rise in the size of microbiological datasets generated by new, high-
throughput technologies is excitingly mirrored by the advent of new computational
and social paradigms for the analysis of large datasets. Classic algorithms for clus-
tering, classification and comparison can be applied to new datasets of unprece-
dented size using Grid and Cloud technologies, and the interpretation of results
can, at the discretion of the researcher, be made available to millions of crowd-
sourced minds. Data mining first arose decades ago as a means by which marketers
could make the most of trends in consumer behaviour, but it currently offers the
promise of guiding microbiologists towards a deeper understanding of microbial
behaviour.
References
Abdi, H. and Williams, L. J. (2010). Principal component analysis. WIRES Comput. Stat. 2,
433-459.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local
alignment search tool. J. Mol. Biol. 215, 403-410.
Altschul, S. F., Madden, T. L., Sch ยจ ffer, A. A., Zhang, J., Zhang, Z., Miller, W., and
Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein data-
base search programs. Nucleic Acids Res. 25, 3389-3402.
Altschuler, D., Daly, M., and Kruglyak, L. (2000). Guilt by association. Nat. Genet. 26,
135-137.
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G.,
Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. (2010). A view of Cloud computing.
Commun. ACM 53, 50-58.
Arrigo, P., Cardo, P. P., and Ruggiero, C. (2007). Integrated bioinformatics analysis of struc-
tural differences in metabolic pathways. An application to Mycobacterium leprae . In: 2nd
IEEE International Conference on Nano/Micro Engineered and Molecular Systems, 2007
Bangkok, Thailand .
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P.,
Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L.,
Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M.,
Rubin, G. M., and Sherlock, G. (2000). Gene Ontology: tool for the unification of biology.
Nat. Genet. 25, 25-29.
Search WWH ::




Custom Search