Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

approach is applicable to both chemical and biological data. An example of an ap-

plication which is relevant to both domains is that of finding relevant substructures

of molecules [ 25 , 40 ]. In the following, we will discuss some useful applications in

both domains.

12.1

Chemical Applications

Since frequent pattern mining is closely related to that of classification, as discussed

earlier, many methods have been developed for predictive tasks with the use of fre-

quent pattern mining. Examples of such tasks include carcinogenesis prediction [ 117 ]

and predictive toxicology evaluation [ 118 ]. Key characteristics of compound repre-

sentations can often be characterized by descriptor-based representations [ 24 , 72 ].

The properties which are tracked are generally structure-driven, and may correspond

to activity, toxicity, absorption, distribution, metabolism and excretion [ 24 ]. A nat-

ural way of mining these descriptors is with the use of algorithms such as frequent

subgraph mining. Frequent subgraphs of a chemical graph database are defined as

all subgraphs that are present in at least a certain minimum number of compounds

in the database. This is essentially the minimum support requirement, and define

the descriptors for the compounds in the database. The main challenge here is that

the optimum value of the minimum support to be used may not be known a-priori

for a given database. Nevertheless, since different data sets may contain different

number of descriptors, with different supports, sizes, and shapes, such an approach

provides some flexibility with the sue of the minimum support parameter, as long

as an effective approach for tuning is available. Such descriptors are quite useful

for chemical compound classification, since they encode important properties of the

chemical compound, which may be very relevant to classification. An example of

such an approach is discussed in [ 41 ], which uses the descriptors defined by frequent

subgraphs for chemical compound classification.

12.2

Biological Applications

Biological data is available either in the form of sequence data or graph-structured

data. In both cases, frequent pattern mining methods can be very helpful in dis-

covering different kinds of insights. Much of biological and microarray data can

be expressed as sequences in its most simplified form. In these cases, many algo-

rithms have been developed in order to determine useful frequent patterns from these

sequences [ 34 , 35 , 89 , 104 , 105 , 111 , 125 , 123 ]. One special characteristic of bio-

logical data is that the number of rows may not be too large, but each individual row

may be very long. As a result, row-enumeration techniques are often used in such

scenarios. Such patterns provide an idea of the characteristics of the underlying data,

and may also be used for other data mining tasks such as classification. The issue

Search WWH ::

Custom Search

Home