Computational Approaches to Fragment and Substructure Discovery and Evaluation - Fragment-Based Drug Discovery

Chemistry Reference

In-Depth Information

offers finding fragments with carbon chains of varying length. This can be useful for the

exploration of biochemical reactions where this length is less important. [ 16 ]

Interestingly, the four fragment miners mentioned above have been made available as

a single package named ParMol (Parallel Molecular Mining). [ 17 ] In addition to uniform

access to MoFa, gSpan, FFSM and Gaston, the authors included a 2D viewer for molecular

structures, parallel (multiprocessor) search and support for several file formats such as

SMILES and SDF and a number of options to customize mining.

Other algorithms for frequent fragment mining that are more database-centric include

Molfea [ 18 ] and Warmr. [ 19 ] Molfea (Molecular Feature Miner) [ 18 ] is in essence an inductive

database framework. It finds patterns based on first-order logic. Molecules are encoded as

basic facts and queries result in a combination of facts. The fragments that can be searched

for or result from queries are linear sequences of non-hydrogen atoms and bonds. The fact

that Molfea only finds chains of atoms limits its usefulness since almost all molecules have

rings or branching points. Warmr [ 19 ] is a general-purpose Inductive Logic Programming

(ILP) data-mining tool for finding frequently occurring patters in relational data. [ 20 ] [ILP

is a machine learning technique used for knowledge discovery. The purpose of ILP is

hypothesis generation, given some background knowledge and a set of positive and neg-

ative examples. Examples and background knowledge are encoded as a facts and rules

in a relational database. From this, possible hypotheses are generated through inductive

learning. Logic programming is used to represent examples, background knowledge and

hypotheses, in a uniform way.] ILP has been successfully applied to chemical data, for

instance to find frequent substructures in carcinogenic compounds. First, molecules are

described in a relational language. Atoms are related to molecules and to other atoms

through bonds. Algorithms such as Warmr perform multi-relational data mining, which

means they are capable of finding patterns that span across multiple relations. Warmr

searches the available patterns in a breadth-first manner, starting from the most general

relations and gradually increasing the level of complexity, to find patterns that are more

specific. Candidates that are more specific are generated by pruning nonfrequent patterns

from the next level. Several meaningful relationships were reported for application of ILP

on toxicity data. Although Warmr should be able to produce identical results compared

with the fragment miners, it inherits some of the drawbacks related to ILP. First, a high

level of expertise is required to encode the molecules, i.e. the graph and their properties,

into relations that can be mined. Second, the complexity of relations queried, places high

demands on computing resources [ 19 ]

Common substructures. Fragments are also derived by comparing molecular structures.

For a pair of molecules, a number of substructures/fragments may exist that occur in both

structures. A 'common substructure' is a set of atoms that two molecules have in common.

Corresponding atoms should have the same atom type and the same topological distance to

other common atoms, in both molecules. The topological distance is the number of bonds

that form the shortest path between two atoms. The 'maximum common substructure'

(MCS) is a continuously bonded substructure that has the highest number of common

atoms. [ 21 ] Note that there may be multipleMCSs for a pair of molecules. Figure 8.5 shows an

example of the MCS of two molecules, of which the largest is the molecule from Figure 8.1

The 'highest scoring common substructure'(HSCS) [ 21 ] is similar to theMCS, but also allows

discontinuous common substructures. Scores are based on the number of common atoms

Fragment-Based Drug Discovery

Search WWH ::

Custom Search

Home