Chemistry Reference
In-Depth Information
as necessary to provide all compound ids for any substance id. Adding
a column for compound association type allows that information to be
included as well. Figure 6.4 is an ERD showing how all the PubChem tables
are related. The search from the previous section can now be expanded to
include compound data.
Select
n.sid, sc.compound_id, sc.compound_type, c.openeye_can_smiles ,
s.ext_datasource_name, s.ext_datasource_regid,
n.activity_outcome, n."log_gi50_M", n.log_gi50_ugml
From
compound c
Join substance_compound sc On sc.compound_id = c.cid
Join substance s On s.substance_id = sc.substance_id
Join nci_h23 n On n.sid = s.substance_id
Where activity_outcome = 2
Order By sid;
Note two additional Join c l au s e s, e ac h w it h t h e ap pr o pr i at e On c l au s e n a m -
ing the columns that relate the tables being joined. The additional columns
compound _ id , compound _ type , and openeye _ can _ smiles are
from the compound table. No columns are actually selected from the sub-
stance _ compound table. That table is simply used to affect the many-
to-many relationship between the substance and compound tables.
6.5 Data Constraints and Data Integrity
Several constraints on data have already been discussed. When a column is
defined to be numeric , it is forbidden to insert anything other than a num-
ber into that column. There are constraints of this type on every standard
SQL data type. This ensures a type of data integrity. When data is selected
from a column with data type timestamp , the user of the data, whether
a person or a computer program can be sure the data represents a valid
timestamp . This type of data constraint also prevents errors from creeping
into the database due to errors on data input. For example, the string 'Nan'
would not be allowed in a column of type numeric. While some computer
systems and languages freely use the string 'Nan' to represent (on output)
“not a number,” an RDBMS would reject this and never allow anything
other than a valid number (or possibly a null) in a numeric column.
Another commonly used constraint is the uniqueness constraint.
In previous examples, the column compound _ id was defined to be a
unique integer. When the uniqueness constraint is used in a table holding
a collection of compounds, it ensures that there can never be more than
one compound with a particular compound _ id . This is essential if other
data about a compound are stored in other tables that use compound _ id
as a foreign key. Notice that this does not prevent two identical compounds
Search WWH ::




Custom Search