Chemistry Reference
In-Depth Information
The purpose of the composition table is to provide the ability to store
the information that a sample consists of one or many compounds. There
might be many rows in composition, each with the same sample _ id
and different compound _ ids . Or there may be a single entry in the
composition table indicating that a sample contains only one com-
pound. Finally, there may be no entry in composition for a sample _ id
indicating that the sample is of unknown composition.
This schema can be expanded in many ways. For example, other
information about the sample can be added, such as whether the sample
is a liquid, crystal, solution, etc. If necessary, a table might be used to store
the sample _ ids of toxic or radioactive compounds, or of compounds
monitored by some governmental regulatory agency. Rather than trying
to foresee all possibilities and add columns to the sample table, it is much
simpler and more robust to add new tables as new information becomes
available or necessary.
A general rule is this: Keep each table as simple as possible, with the
fewest number of columns, each of which is essential to describe the entity
(e.g., sample, compound, or chemist). Assign a unique integer id column
and use that id in relationship to other tables containing more informa-
tion or related information.
6.4 Schemas for PubChem Data
In the previous section, a schema was described for a compound tracking
system based on user specifications. The designer is free to create new
schemas and tables to fit the user specifications. Sometimes, an existing
system needs to be analyzed in order to fit into an RDBMS model. Often,
the system will have been implemented using several sets of files, with
various programs implementing relationships among these files and the
data in them. In this case, the structure of the schemas and tables is “sug-
gested,” or even required by the structure of the existing data.
The U.S. National Institutes of Health PubChem project contains infor-
mation on millions of chemical compounds. 1 The data are divided into
three main sections. PubChem Substance contains structures supplied by
depositors. PubChem Compound contains unique structures with com-
puted properties. PubChem BioAssay contains bioactivity assay results
supplied by depositors. The data in these three sections are recorded inde-
pendently, yet there are chemical relationships among these sections. For
example, information available as a PubChem BioAssay is associated with
a particular substance for which the data were collected. A substance may
be a single compound or a mixture of several compounds.
In order to find structures and data in PubChem, there are search
tools available online. 2 This may suffice for your needs. The data are also
available in the form of SDF files and csv (comma-separated values) files.
Search WWH ::




Custom Search