Chemistry Reference
In-Depth Information
Spelling anomalies can be easily identified and corrected, assigning
the refid of the correct spelling to those that are incorrect. Of course, the
original table from which the references came would have to be updated
as well to contain the refid for the correct spelling. Concerning spelling
and accuracy mistakes, there is little help that a programmatic approach
can offer here. A healthy database needs to be well curated. Notice how-
ever that putting tables in third normal forms brings such troubles to light
and makes correcting them relatively straightforward.
A.4 SQL Functions
Several SQL functions are discussed in the earlier chapters. This section
shows the code needed to define these functions and make them available
for use in a PostgreSQL database.
A.4.1 Public166keys
This function returns a length 166 fragment key. The input text string ( $1
in the function body) is a SMILES, as expected by the matches function .
The table of fragments is based on the MACCS 166 public keys 1 and is
shown in Table A.5.3 of this Appendix.
Create Function public166keys(character varying)
Returns bit(166) As $EOSQL$
Select orsum(bit_set(0::bit(166),abit))
From public166keys Where matches($1,smarts);
$EOSQL$ Language SQL;
A.4.2 Orsum
This is an aggregate function, analogous to the standard SQL aggregate
function sum . While sum operates on numeric values, orsum operates on
bit strings, or-ing together each input value to provide the result. This
example uses the PostgreSQL bit data type and the native function bitor
to or together two bit strings.
Create Aggregate orsum (
Basetype = bit,
Sfunc = bitor,
Stype = bit
);
A.4.3 Tanimoto
The tanimoto function is used to compute the Tanimoto similarity of
two bitstrings. The input bit strings would have been computed with the
public166keys function or another equivalent fragment key or finger-
print function.
Search WWH ::




Custom Search