Chemistry Reference
In-Depth Information
nci_h23
sid
ext_datasource_regid
cid
activity_outcome
activity_score
activity_url
assaydata_comment
assaydata_revoke
log_gi50_M
log_gi50_ugml
log_gi50_v
indngi50
stddevgi50
logtgi_m
logtgi_ugml
indntgi
stddevtgi
INTEGER
INTEGER
INTEGER
INTEGER
INTEGER
TEXT
TEXT
TEXT
NUMERIC
NUMERIC
NUMERIC
INTEGER
NUMERIC
NUMERIC
NUMERIC
INTEGER
NUMERIC
Figure 6.2 Entity-relationship diagram for nci_h23 data table.
This section will show how these files can be used to populate a schema of
tables designed for PubChem data. While your chemical information may
not correspond exactly to this schema, it should be instructive to see how
the PubChem schema is designed and used.
6.4.1 BioAssay Data
PubChem BioAssay is available as hundreds of different files 3 The files
are named, for example, 1.csv.gz, 1.descr.xml, 2.csv.gz, 2.descr.xml. The
xml files are descriptions of the data contained in the corresponding csv
file, which results when the csv.gz file is unzipped. For example, the file
1.descr.xml contains the information: “Growth inhibition of the NCI_H23
human Non-Small Cell Lung tumor cell line is measured as a screen for
anti-cancer activity” as well as information about the various columns of
data in the 1.csv file. This information is used to define a table to hold the
data in the 1.csv file. Figure 6.2 shows a representation of the table, named
nci_h23. Using additional information in the 1.descr.xml file and using
the capabilities of the RDBMS to incorporate comments on tables and col-
umns, the following SQL defines the nci _ h23 table.
Create table pubchem.nci_h23
(
"sid" Integer,
"ext_datasource_regid" Integer,
"cid" Integer,
"activity_outcome" Integer,
"activity_score" Integer,
"activity_url" Text,
Search WWH ::




Custom Search