Chemistry Reference
In-Depth Information
value. Finally, the hclust R function is called using (1.0 - tanimoto). In
this example, two methods are used: ward and single . Figure 12.1 shows
the output from the plot function using ward clustering.
The use of the R functions is described elsewhere. 2,3 The point of
this example is not to explain how to do clustering, but rather to show
how easily data is read into R from an RDBMS. The full power of the
SQL language and the extension functions, such as tanimoto here allows
great flexibility in creating data frames for R. With very little change in
the above code, various methods of clustering can be tested. With simple
changes in the SQL statement above, other similarity measures, such as
euclid or hamming can be investigated.
12.5.2 Linear Models
R contains a useful linear models function to carry out regression analysis
to fit a set of parameters to experimental data. The example here estimates
logP values using a set of fragments and coefficients, such that
glogp = ∑ Ci i * N i
(Formula 12.1)
where glogp is the estimated logp value for a molecule, Ni i is the number
of times the each fragment is contained in the molecule, and Ci i is the coef-
ficient resulting from the linear models fit. The fragments are defined as
a set of SMARTS 4 to be matched against the molecule using the count _
matches function described in Chapter 7. The following R script shows
how this is accomplished.
require("RODBC");
channel = odbcConnect("PostgreSQL30", uid="reader",
case="postgresql");
# get experimental logp from training_set
sql = "Select logp from xlogp.training_set order by id";
logpval = sqlQuery(channel, sql, max=0);
ntrain = length(logpval$logp);
# get smarts
sql = "select smarts,train_freq from xlogp.simplex
where train_freq > 1 order by train_freq desc";
smarts = sqlQuery(channel, sql, max=0);
# match each smiles in the training_set to each fragment smarts
sql = "Select count_matches(smiles,smarts) as matches from
(select smarts, train_freq from xlogp.simplex
where train_freq > 1) as smarts,
(select smiles, id from xlogp.training_set order by id) as train
order by train_freq desc, smarts, id";
Search WWH ::




Custom Search