Chemistry Reference
In-Depth Information
This function takes a SMILES and a SMiles ARbitrary Target Specifications
(SMARTS). The SMARTS is used to locate a substructure within the
SMILES and color the atoms that are matched.
12.5 R Programs
R is a program used to compute statistical results for sets of data. One of
the commonly used data types in R is the data frame. This has many simi-
larities to tables of data in a relational database, or tables resulting from
an SQL select statement. Using the RODBC module, R can communicate
with a RDMBS using SQL to read data into a data frame form and to write
a data frame to an RDBMS table.
12.5.1 Hierarchical Clustering
The following example shows how R can be used to carry out a clustering
analysis using data stored in an RDBMS.
require("RODBC");
channel = odbcConnect("PostgreSQL30", uid="reader",
case="postgresql");
sql = "Select
Case When a.id < b.id Then tanimoto(a.gfp,b.gfp)
Else null
End
from xlogp.test_set as a, xlogp.test_set as b";
tani = sqlQuery(channel, sql, max=0);
n = sqrt(length(tani[,1]))
tanimoto = as.dist(matrix(tani[,1], nrow=n, ncol=n));
fit = hclust(1.0-tanimoto, method="ward");
plot(fit);
fit = hclust(1.0-tanimoto, method="single");
plot(fit);
close(channel);
The SQL statement above computes the Tanimoto similarity between all
pairs of compounds using fingerprint bitstrings stored in the column
gfp . The tanimoto function is described in Chapter 8 and shown in the
Appendix. This SQL statement uses the Case conditional clause. This is
done in order to avoid computing elements unnecessarily. The matrix of
similarities is symmetric and the diagonal elements are exactly 1. The
sqlQuery R function reads the rows of the similarity matrix into an R
data.frame named tani . This is coerced into a matrix of the correct
number of rows and columns using the matrix function and further
coerced into a distance R object. The R distance object is the lower half
of a symmetric distance matrix. Since the tanimoto similarity is used,
the distance (or dissimilarity) is represented by 1.0 minus the tanimoto
Search WWH ::




Custom Search