Genomic Data Explosion – The Challenge for Bioinformatics? - Advances in Data Mining

Information Technology Reference

In-Depth Information

mechanisms and complexity. The network architecture of type ART1 self-organises

and self-stabilises its recognition codes and categorises arbitrarily many and

arbitrarily complex binary input patterns [20]. We obtain the input patterns for ART1

from gene expression micro array data of different samples of the same disease by

using binary coding. As result of ART1 analysis we get a specific pattern of together

expressed genes, which shall be deemed to be typical in general for a considered

disease. Such a resulting gene pattern is one of the integral parts essential necessary

for generating genetic networks.

By the way there are other interesting bioinformatic applications of neural networks

discussed in the biomedical literature as for instance an approach for classifying

nursing care needed, using incomplete data [21], for detecting periodicities in the

protein sequence and increasing in this way the prediction accuracy of secondary

structure [22] or predicting drug absorption using structural properties [23].

4.3 Text Mining

Mining Causal Relationship between Genes from Unstructured Text. Using the

method described in 5.2. a subset of genes is classified by the neural network for a

special biological context, e.g. for a considered disease. As we want to automatically

construct a causal genetic network, the next type of information we need concerns

causal relationships between classified genes.

One of the richest sources of knowledge nowadays is the internet. This is especially

true for the biological domain and within this domain for the field of genomics. A

huge amount of data is now available to the public. Much of this data is stored in

publicly available databases. Therefore it is reasonable to integrate this knowledge

into the construction of genetic networks. A straight forward approach is to find

databases which contain the type of information we are looking for. For our work we

found the appropriate information in the GeNet database. We designed and

implemented a tool that consists of three sequential working components: first a

database adapter that connects to the internet database GeNet, queries the data and

stores all query results locally on the computer. A parser tool analyses the stored data

and extracts the wanted information. In the last step a filter tool searches for data

redundancy and inconsistency and prepares resulting data with gene relation

information for import into the software system for generating and presenting genetic

networks.

But a lot of specific knowledge is not available in such a structured form. It is

distributed somewhere in the net and it is presented in unstructured text. In our case

relationships between genes are not available in special databases but it may be found

in the biomedical literature. Most of these articles are available online. One of the key

databases for publications in this field is the PubMed database. PubMed contains over

11 million abstracts today and approximately 40,000 new abstracts are added each

month. To use this source of information we have to deploy more sophisticated

methods than those described above. One way to integrate this knowledge into the

analysing process of the micro array data automatically is the usage of techniques of

Information Extraction (IE). In this paragraph we first give a definition of IE. Than

we will focus on the problems to deal with when applying IE to the biological

Advances in Data Mining

Search WWH ::

Custom Search

Home