Biology Reference
In-Depth Information
Information retrieval is simply the acquisition of documents from repositories.
Anyone who has used PubMed, or a Web-based search engine has used standard infor-
mation retrieval methods. Semantics—the assignment of meaning—is a non-trivial
exercise, because of the aforementioned complexity of biological terminology.
Assigning semantics is usually done using ontologies. Once a document is processed
and indexed, information can be extracted by means of a database query, or some more
specialised algorithm. An excellent review of the intersection between genomics and
natural language processing can be found in ( Yandell and Majoro, 2002 ).
One widely used, ontology-based tool is TextPresso, from the Generic Software
Components for Model Organism Databases. 9 TextPresso splits papers into sen-
tences, and then marks words or phrases with XML tags, derived from a specifically
developed ontology. These semantically tagged snippets are stored in a database,
which can be searched by keyword or category. The use of a tool such as TextPresso,
however, requires significantly more technical ability than does a keyword search.
As always, there is a trade-off between simplicity and power.
In order to address this problem, Web-based applications incorporating a range of
algorithms are becoming increasingly popular; a recent review of 28 such tools has
led to the construction of an overview site, based at the NCBI, and dedicated to track-
ing existing systems and future advances in the field of biomedical literature search
( Lu, 2011 ).
Text mining has been applied to microbiology research to address just about
every conceivable question, either alone or in combination with other data mining
approaches. Some interesting recent examples include: automated inference of
microorganism habitat ( Kolluru et al. , 2011 ); exploring the dynamics of relation-
ships between pathogens and infectious diseases ( Sintchenko et al. , 2010 ); identify-
ing viruses and bacteria with the potential to be used as bioterrorism weapons ( Hu
et al. , 2008 ); and the identification of molecules with potential pharmacological
action ( Sarker et al. , 2012 ).
Software Availability
PATRIC (Pathosystems Resource Integration Center): http://patricbrc.vbi.vt.edu/portal/portal/
patric/Home (Includes a knowledge base constructed using text mining, plus several other
valuable tools).
Anni2.1: http://biosemantics.org/index.php?page ¼ anni-2-0 .
NCBI list of biomedical text mining Web sites: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/
search/ .
TextPresso: http://www.gmod.org/wiki/Textpresso .
9 http://www.gmod.org/wiki/Main_Page .
Search WWH ::




Custom Search